    the protocols introduced are based on the following

general assumptions: 1. Nodes within a cluster are fully connected (e.g.,

nodes under leader 2, 3, or 4), including nodes

belonging to different subtrees (e.g., all nodes under

clusters (or machines) may not see each other directly (e.g., nodes

machines) may not see each other directly (e.g., nodes

under leader 2 cannot see nodes under leader 3). 3. All machines in the system must be fully connected

machine can see at least one node from each of the other

can see at least one node from each of the other

machines. Before getting into the details of the structure and

required mechanisms, we define some basic terms that will

be used throughout this section. Virtual cluster: A collection of nodes within one

leader (a subtree). A large cluster is divided into

leader (a subtree). A large cluster is divided into

multiple virtual clusters to make communications

and management more efficient (e.g., the agents led

by leader 4 in . 2. Head node: In some cases, a cluster has a single node

that is connected to other machines or clusters. This

node is called the head node and has a leader agent

residing on it. 3. Local node: From the viewpoint of an agent/leader,

the node where it resides is its local node. 4. Remote node: From the viewpoint of an agent/

leader, nodes other than its local node are remote

nodes The protocols introduced here require a number of standard

control messages that the agents use to communicate and

exchange information. These messages, referred to as the

middleware control messages (MCM), are defined here. Leader Advertisement Message (LAM): A broadcast

message sent by a newly created leader to inform

other existing leaders of its birth. LAM contains the

leader's ID (a unique identifier acquired at startup) and its address information. 2. Agent Monitor (AM): Periodic messages sent by leaders to one another and to descendant agents to

check if they still exist. 3. Leader Advertisement Acknowledgment Message (LAAM): Sent by a leader upon receiving a LAM or AM from another leader. LAAM contains respondent's ID and address information. 4. Agent Activation Message (AAM): Sent by a leader to activate descendant agents. It contains the leader's ID and address information, in addition to an activation command. 5. Agent Monitor Acknowledgement (AMA): Sent by an agent in response to an AM or AAM. It contains the sender's ID, address, and resources information.

6. Leader Not Responding Message (LNRM): Sent by a leader that does not receive an LAAM from another leader in response to the AM message, to all leaders at the same level, and the leader's parent if one exists. It contains the sender's ID and the nonresponding

leader’s ID.

For the agents to operate efficiently, they need a startup

protocol to automatically identify and communicate with

one another. The initial stage requires manual installation of

the first leader agents on the head nodes. The leaders then

start the startup and automatic configuration phase.

1. Each leader is responsible for performing the

following tasks: a. Execute the startup protocol to automatically

acquire connectivity and operational information

in the system. b. Periodically perform availability checks of the

leaders and descendant agents. If a leader or

agent does not respond, activate leader recovery

or agent update protocols. c. Perform object routing for other agents to ensure

full connectivity with other clusters and machines

in the system. Many routing protocols

can be adapted for this system, but the discussion

of the routing details is beyond the scope of

this paper.