A Single System Image Operating System Layer for Clusters or Wide-area Distributed Systems PhD Research Proposal by Rajkumar Buyya ----------------------------------------------------------------- 1. The rationale for doing the research The major objectives of my research are the following: * to build and contribute to the knowledge of operating environ- ments exhibiting a single-system image, in particular, to provide important additions to the knowledge of operating systems for distributed/cluster computing. * to provide a cost effective solution for high performance computing, by building global parallel cluster using intranet or Internet. * to demonstrate suitability of using concepts of neural network and microkernel approach in constructing distributed operating systems. * to achieve an advanced-development position in operating sys- tems design. 2. The scope of the proposed research To move into the real era of high performance computing, the operating environment running on network/cluster of workstations/PCs based high performance computers must provide a unified system view, called Single System Image (SSI). The user need not be aware of the underlying system architecture to use these machines effectively. The operating environment must be familiar (provide the same look and feel of the existing system) and convenient to use. The user must be provided with the view of globalized file system, processes, and network. This allows the user to access system resources such as memory, processors, network, etc., transparently irrespective of whether they are available locally or remotely. A single-system image can be provided at any one or more of the following levels: * Hardware Level * Operating System Level * Message Passing Interfaces Level * Language/Compiler Level * Tools Level The scope of my research work confirms to building a single system image at operating system level. A team of researchers working on this, will be able to build a complete distributed operating system exhibiting a single-system image; we should be able to use even Internet as a global parallel cluster. 3. Overview of the Proposed Work Cluster of workstations connected by high speed networks is gaining popularity as a platform for cost-effective high perform- ance/parallel computing. The operating environment on such clus- ter of workstations does not offer flexible environment for common users to operate. There is need for an operating system layer which offers the same look-and-feel and the ease-of-use of the traditional operating system yet offering high performance. In above context, I would like to propose an operating system layer which meets the objectives of a single system image and allows to use network or cluster of workstations/PCs and internet as a global parallel cluster. The proposed scalable system shall provide globalized access to system resources (CPUs, Memory, Disk, Network) by providing global resource allocation. That is, globalized process management, globalized memory management, globalized file system, and globalized network access. Any re- sources in the network shall be accessed seamlessly whether they are at local or remote site. The proposed system exhibit single system image by supporting a parallel file system, parallel commands, fault tolerance, and high availability. The operating system shall incorporate all these features without the need of additional primitives or commands but having the same existing formats. 4. Research/Development Methodology There are two approaches for building proposed system: 1. Build from scratch 2. Build on top of existing systems I would like to follow the second approach: build a layer on top of the existing operating systems and perform global resource allocation. This strategy makes the system quickly portable, tracks vendor software upgrades, and reduces development time. This shows new systems can be built quickly by mapping new serv- ices onto the functionality provided by the layer beneath. Single system image is provided by gluing together local operating systems (such as UNIXs, Linux, Microkernels, etc.) running on each workstation/PC on the network. The main motivation of using cluster of workstations for building high performance machines is to use existing hardware and protect investment. The same is true for software. There is already tremendous investment in the development of commercial operating systems and their applications. The second approach also avoids re-implementation of a huge amount of incidental code (firmware, drivers, process, memory managers) that already works on commer- cial systems. 5. The expected outcomes of the research The proposed work allows to realize portable cluster/distributed operating system, which is dynamically configurable, extensible, and interoperable. It also shows the use of Internet as a global parallel cluster for cost-effective high performance computing. This work will contribute to the knowledge of distributed operat- ing computing. 6. Reference to any preparatory reading that has been completed I have knowledge of the following topics to carry out research in the proposed area: * Advances in operating systems design * Microkernel based operating systems * Parallel and Distributed/Cluster Computing: Opportunities and Challenges * Suitability of Java for Parallel/Distributed Computing * A case for building a Global Parallel Cluster using Internet * Experience in implementing cluster monitoring system in Java; this has resulted in thorough understanding of concepts of client server computing, implementation of multithreaded servers, design and development of software that needs to provide multiple serv- ices simultaneously, and kernel processing. * Extensive survey on the proposed work. Please see the enclosed paper: Single System Image: Need, Approaches, and Supporting HPC Systems, Proceedings of the 4th International Conference on Parallel and Distributed Processing, Techniques and Applications, Las Vegas, USA, 1997. I have organized and chaired sessions on single system image at PDPTA'97 conference and also served as an associate editor of the conference proceedings. -----------------------------------------------------------------