Cluster Environment Observer (CEO)

A Tool for Monitoring and Administration of Heterogenenous Clusters

The problem

The usage of clusters of computers as high-performance and reliable computing platform is getting popular day by day. The normal tendency is to build a small cluster with few nodes in the beginning and add more nodes as computational requirements grow. This on-demand growing tendency of clusters leads them to becoming heterogeneous both in hardware (for instance, node processing power or memory size) and software configuration. It is also quite common to find a cluster where some of the machines run the Unix operating system while others run NT.

The existing monitoring tools are only targeted for one of these two systems and it forces system administrators to use different tools for monitoring different clusters. This approach goes against the idea that cluster should offer a Single-System Image. The administrator has to use different tools for different nodes. The ideal solution would be a single monitoring/administration tool that could be used to monitor and adminster both Unix (including Linux) and Windows clusters in the same way. Our Cluster Environment Observer aims offer an environment that allows to monitor and administer heterogeneous clusters through a single interface.

Our approach

There are several possible approaches to solve this problem. The first one consists on developping a completelly new tool that is able to work on both system. Although it is a feasible solution, it requires too much effort. The other solution is to find already existing tools that work on one of the system and port some part of them in order to be able to run them on the other system.

In our case, we have picked three tools that were already working on Unix systems and implementd the necessary code to run them also under NT giving the image of a single system.

Sever based tools

After studing many monitoring/dministration tools, we found that most of them follow the client/server approach. There is a processs in each node to be monitored/administered that knows what happens in the machine. This sever informs another process (one per system) that is in charge of interacting with the system administrator.

This architecture has simplified our task very much as we have only needed to implemnt the server that runs and manages each node. The user interfaces are the ones alredy existing.

Furthermore, as these server usually offer the same kind of information, we have been able to implement a single server that is fully compatible with three already existing monitoring/administration tools.

Tools that can be NOW be used in both systems (NT and Unix):

PARMON

NetSaint

PARAVER

Download the tool

STILL UNDER DEVELOPMENT

For more information please send an e-mail to Toni Cortes.

Publications

Still working on them.

People

Oriol Teixió, Universitat Politècnica de Catalunya, Barcelona, Spain
Toni Cortes, Universitat Politècnica de Catalunya, Barcelona, Spain
Rajkumar Buyya, Monash University, Melbourne, Australia

Project Homes

In Australia: http://www.dgs.monash.edu.au/~rajkumar/ClusterObserver/
In Spain: http://www.ac.upc.es/homes/toni/ClusterObserver/CEO.html