Grid 2005 - 6th IEEE/ACM International Workshop on Grid Computing

Paper Abstracts

Wide Area Data Replication for Scientific Collaborations

Ann Chervenak, Robert Schuler, Carl Kesselman, Scott Koranda, Brian Moe

Scientific applications require sophisticated data management capabilities. We present the design and implementation of a Data Replication Service (DRS), one of a planned set of higher-level data management services for Grids. The capabilities of the DRS are based on the publication capability of the Lightweight Data Replicator (LDR) system developed for the LIGO Scientific Collaboration. We describe LIGO publication requirements and the LDR capability. Then we describe the design and implementation of the DRS in the Globus Toolkit Version 4.0 environment and present performance results.

Protecting Grid Data Transfer Services with Active Network Interfaces

Onur Demir, Michael R. Head, Kanad Ghose, Madhusudhan Govindaraju

The inherent dynamic and heterogeneous nature of virtual organizations introduces challenging performance issues that need scalable, robust and efficient solutions. To improve throughput of grid data servers under heavy loads or under denial of service attacks, it is important to service requests differentially, giving preference to ongoing or imminent client requests. We show how such features can be efficiently implemented on an active network adapter based gateway that controls access to a pool of backend data servers. We present performance results for a prototype system based on a dual-ported active NIC, and demonstrate that a efficient differentiated service policy can be implemented on such a gateway to minimize the grid service response time and to improve server throughputs under heavy loads and denial of service attacks. We test with several network and server loads and show that response times can be maintained at a level similar to normal, low-load conditions.

Authorization and Account Management in the Open Science Grid

Markus Lorch, Dennis Kafura, Ian Fisk, Kate Keahey, Gabriele Carcassi, Tim Freeman

An attribute-based authorization infrastructure developed for the Open Science Grid is presented. The infrastructure integrates existing identity-mapping and group-membership service using concepts prototyped in the PRIMA system. Authorization scenarios for requests to compute and data resources are detailed. A new SAML obligated authorization decision statement is introduced that attaches an XACML obligation to the authorization decision. The use of obligations enables site-centralized, service-independent policy management. Authorization decisions are enforced via a Workspace Service that creates constrained execution environment configured in accordance with the obligations and other attribute-based information. Finally, an experimental PRIMA authorization service that extends and simplifies the infrastructure is described.

On the Creation & Discovery of Topics in Distributed Publish/Subscribe Systems

Shrideep Pallickara, Geoffrey Fox, Harshawardhan Gadgil

Publish/Subscribe infrastructures have in the recent years gained significant traction with several specifications such as the Java Message Service, WS-Eventing and WS-Notification trying to capture the essence of publish/subscribe systems and enabling the development of interoperable systems. In this paper we present a scheme for the discovery of topics in distributed publish/subscribe systems. The scheme outlined in this paper addresses security related issues such as authorization and provenance in the discovery of the aforementioned topics. We have also included empirical results from our implementation of this scheme to demonstrate the feasibility of this mechanism. The work that we describe here can be used in systems based on JMS, WS-Eventing or WS-Notification.

Grid-Enabling a Vibroacoustic Analysis Application

Brian Bentow, Jon Dodge, Aaron Homer, Christopher Moore, Robert Keller, Matthew Presley, Robert Davis, Jorge Seidel, Craig Lee, Joseph Betser

This paper describes the process of grid-enabling a vibroacoustic analysis application using the Globus Toolkit 3.2.1. This is the first step in a project to grid-enable a suite of tools being developed as a service-oriented architecture for spacecraft telemetry analysis. In this paper we show the advantage of grid-enabling a single computationally intensive tool in a vibroacoustic analysis flow. The result is that using as few as eleven nodes, the tool's runtime improved by a factor of eight. While communication overhead does affect performance, these results also indicate that a coordinated communication and execution scheduler might be able to significantly improve overall efficiency. In the larger context, our experience also shows that the service-oriented architecture approach, using grid computing tools, can provide a more flexible system design, in addition to improved performance and increased utilization of resources. We also provide some lessons learned in using the Globus Toolkit.

Collective Operations for Wide-Area Message Passing Systems Using Adaptive Spanning Trees

Hideo Saito, Kenjiro Taura, Takashi Chikayama

We propose a method for wide-area message passing systems to perform collective operations using dynamically created spanning trees. In our proposal, broadcasts and reductions are performed efficiently using topology-aware spanning trees constructed at run-time; processors autonomously measure latency and bandwidth to create latency-aware trees for short messages and bandwidth-aware trees for long messages. Our spanning trees adapt to topology changes due to the joining or leaving of processors; when processors join or leave a computation, processors repair the spanning trees so that the effective execution of collective operations can continue. With real processors distributed over several clusters, our collective operations performed much better than a topology-unaware implementation, although not quite as well as a static topology-aware implementation. When some processors joined or left a computation, our broadcast temporarily performed poorly for about 8 seconds while the spanning trees adapted to the new topology, but completed successfully even during this time.

Policy Administration Control and Delegation using XACML and Delegent

Ludwig Seitz, Erik Rissanen, Thomas Sandholm, Babak Sadighi Firozabadi, Olle Mulmo

In this paper we present a system permitting controlled policy administration and delegation using the XACML access control system. The need for these capabilities stems from the use of XACML in the SweGrid Accounting System, which is used to enforce resource allocations to Swedish research projects. Our solution uses a second access control system Delegent, which has powerful delegation capabilities. We have implemented limited XML access control in Delegent, in order to supervise modifications of the XML-encoded XACML policies. This allows us to use the delegation capabilities of Delegent together with the expressive access level permissions of XACML.

Adaptive Trust Negotiation and Access Control for Grids

Tatyana Ryutov, Clifford Neuman, Noria Foukia, Travis Leithead, Kent Seamons, Li Zhou

Access control in grids is typically accomplished by a combination of identity certificates and local accounts. This approach does not scale as the number of users and resources increase. Moreover, identity-based access control is not sufficient because users and resources may reside in different security domains and may not have pre-existing knowledge about one another. Trust negotiation is well-suited for grids because it allows participants to establish mutual trust based on attributes other than identity. The Adaptive Trust Negotiation and Access Control (ATNAC) framework addresses the problem of access control in open systems. ATNAC is based on the GAA-API which provides adaptive access control capturing dynamically changing system security requirements. Based on the sensitivity of the access request and a suspicion level associated with the requester, the GAA-API refers to TrustBuilder to establish a sufficient level of trust between the negotiating participants.

A Credential Renewal Service for Long-Running Jobs

Daniel Kouril, Jim Basney

Jobs on the Grid require security credentials throughout their run for accessing Grid resources. However, delegating long-lived credentials to long-running jobs brings an increased risk, additionally, it is often difficult to predict the run-time of jobs on the Grid. We have developed a solution to this problem using the MyProxy online credential repository. Users store their long-lived credentials in a dedicated MyProxy server and delegate short-lived credentials to their jobs. When a job's credential nears expiration, the Workload Management System retrieves a new short-lived credential from the MyProxy server and refresh the job's credential. The MyProxy server's policy specifies which services may obtain credentials on the user's behalf and logs all accesses for audit purposes. This system has been used for credential renewal in Grids in Europe for over three years. In this paper, we present the system design, describe our experiences, and discuss the security implications of this approach.

Ad Hoc Grid Security Infrastructure

Kaizar Amin, Gregor von Laszewski, Mike Sosonkin, Armin Mikler, Mike Hategan

This paper describes the ad hoc Grid security infrastructure (AGSI) developed as a part of the Java CoG Kit project. AGSI is capable of supporting several requirements that are specific to ad hoc Grids. It specifically focuses on identity management, identity verification, and authorization control in spontaneous Grid collaborations without pre-established policies or environments. It adopts established community standards with modifications where needed. This paper also discusses the integration of AGSI in an ad hoc Grid implementation. The implementation supports secure collaboration in ad hoc Grids using commodity technologies such as the Java CoG Kit, JXTA, GSI, and XACML.

An End-to-end Web Services-based Infrastructure for Biomedical Applications

Sriram Krishnan, Kim K. Baldridge, Jerry P. Greenberg, Brent Stearn, Karan Bhatia

Web services have gained wide-spread acceptance in the Grid community as the standard way of exposing application functionality to end-users. They provide accessibility via a multitude of clients, and the ability to enable composition of data and applications in novel ways for facilitating innovation across scientific disciplines. However, issues of diverse data formats and styles which hinder interoperability and integration must be addressed. Providing Web service wrappers for legacy applications alleviates many problems because of the exchange of strongly typed data, defined and validated using XML schemas, that can be used by workflow tools for application integration. In this paper, we describe the end-to-end architecture of such a system for biomedical applications that are part of the National Biomedical Computation Resource (NBCR). We present technical challenges in setting up such an infrastructure, and discuss in the back-end resource management, application services, user-interfaces, and the security infrastructure for the same.

GROCK: High-Throughput Docking using LCG Grid Tools

David Juan García, Patricia Méndez, José R. Valverde

The study of interactions of proteins with other molecules is a major task to understand living organisms and design new drugs. GROCK is a portal that facilitates mass screening of potential molecular interactions in the Life Sciences. The main purpose for developing GROCK has been to facilitate users the performance of huge amounts of computational tasks using the power of the Grid. In GROCK we have considered issues of high availability, redundancy, failure recovery and maximal explotation of available Grid resources. After trying various approaches we have settled for LCG-submitter, a tool developed for the physics LHC project to solve some of our goals. In this paper we introduce GROCK and analyze its design goals, the challenges found and the solutions we came up with to overcome them.

Highly Latency Tolerant Gaussian Elimination

Toshio Endo, Kenjiro Taura

Large latencies over WAN will remain to be an obstacle to running tightly coupled parallel applications on Grid environments. This paper takes one of such applications, dense Gaussian elimination and describes a parallel algorithm that is highly tolerant to latencies. The key technique is a pivoting strategy called batched pivoting, which requires much less synchronization costs than other methods. Although it is one of relaxed methods that may select other pivots than "best" ones, we show that it achieves good numerical accuracy. Through experiments with random matrices of the sizes of 64 to 32,768, the batched pivoting achieves comparable numerical accuracy to that of partial pivoting. We also evaluate the parallel execution speed of our implements and show that it succeeds to reduce synchronization costs.

Grid'5000: A Large Scale, Reconfigurable, Controlable and Monitorable Grid Platform

Franck Cappello, Frederic Desprez, Michel Dayde, Emmanuel Jeannot, Yvon Jegou, Stephane Lanteri, Nouredine Melab, Raymond Namyst, Pascale Primet, Olivier Richard, Eddy Caron, Julien Leduc, Guillaume Mornet

Large scale distributed systems like Grids are difficult to study only from theoretical models and simulators. Most Grids deployed at large scale are production platforms that are inappropriate research tools because of their limited reconfiguration, control and monitoring capabilities. In this paper, we present Grid'5000, a 5000 CPUs nation-wide infrastructure for research in Grid computing. Grid'5000 is designed to provide a scientific tool for computer scientists similar to the large-scale instruments used by physicists, astronomers and biologists. We describe the motivations, design considerations, architecture, control and monitoring infrastructure of this experimental platform. We present configuration examples and performance results for the reconfiguration subsystem.

Policy-based Access Control in Peer-to-Peer Grid Systems

Juliano Freitas da Silva, Luciano Paschoal Gaspary, André Detsch, Marinho Pilla Barcellos

Access control to resources is one of the most important requirements to be satisfied in grid systems that span over multiple administrative domains. Despite the efforts of the research community to address this topic, existing approaches do not scale (e.g. in terms of communication overhead) for a large number of nodes (peers) providing resources, as these approaches rely on centralized servers to process access requests. Furthermore, they provide limited, large-grain policy specification functionality and are not committed to employing open, standardized formats to express policies. In this paper, we address these limitations by proposing PeGAC (Peer-to-Peer Grid Access Control), a policy-based, distributed access control mechanism, which can be applied to P2P grid systems. In our proposal, policies are specified using RBAC model and coded using the XACML.

Application Centric Autonomic BW Control in Utility Computing

Krishna Kant

QoS and congestion performance are crucial to good application performance in a utility computing environment. Unfortunately, proper IP QoS setup is very complex and is either ignored completely or set rather simplistically. It is well known that without an elaborate end to end QoS setup, TCP connections simply divide up the available excess bandwidth equally among themselves under congestion. In this paper we propose autonomic mechanisms that determine BW requirements of various flows of an application and maintain them in appropriate proportion even during congestion. The estimations are done dynamically and thus can easily track changing application requirements. The paper shows that the scheme not only yields close to desired bandwidth allocation, but also significantly reduces packet losses.

ASKALON: A Grid Application Development and Computing Environment

Thomas Fahringer, Radu Prodan, Rubing Duan, Francesco Nerieri, Stefan Podlipnig, Jun Qin, Mumtaz Siddiqui, Hong-Linh Truong, Alex Villazon, Marek Wieczorek

We present the ASKALON environment whose goal is to simplify the development and execution of workflow applications on the Grid.
ASKALON is centered around a set of high-level services for transparent and effective Grid access, including a Scheduler for optimized mapping of workflows onto the Grid, an Enactment Engine for reliable application execution, a Resource Manager covering both computers and application components, and a Performance Prediction service based on a training phase and statistical methods. A sophisticated XML-based programming interface that shields the user from the Grid middleware details allows high-level composition of workflow applications.
ASKALON is used to develop and port workflow applications in the Austrian Grid project. We present experimental results involving two real-world applications that demonstrate the effectiveness of our approach.

An Autonomic Service Architecture for Self-Managing Grid Applications

Hua Liu, Viraj Bhat, Manish Parashar, Scott Klasky

The scale, heterogeneity and dynamism of Grid applications and environments require Grid applications to be self-managing or autonomic. This paper presents the Accord autonomic services architecture that addresses this requirement. Accord enables service and application behaviors and their interactions to be dynamically specified and adapted using high-level rules, based on current application requirements, state and execution context. The design, implementation and evaluation of Accord are presented. An autonomic data streaming application is used to illustrate the self-managing behaviors enabled by Accord.

HIPernet: A Decentralized Security Infrastructure for Large Scale Grid Environments

Pascale Vicat-Blanc Primet, Julien Laganier

Security in Grids appeals for fundamental primitives like the secure establishment of dynamic and isolated virtual trust domains. The security mechanisms currently used are generally based on a Public Key Infrastructure global to the grid environment, and a mix of global and local access control policies to make authorization decisions. Such approaches do not scale well with the number of participating domains and entities. In this paper we propose a decentralized approach for securing grid environments that better cope with their inherently distributed nature. The combination of network and operating system virtualization with the Host Identity Protocol and Simple Public Key Infrastructure delegation/authorization certificates allows to create virtual trust domains onto multiple shared computer nodes connected by an untrusted network. We analyse how this approach adapts the vast diversity of trust relationships in the real world and has a better scalability with respect to the number of entities involved.

A Semantic Datagrid for Combinatorial Chemistry

Kieron Taylor, David De Roure, Jonathan W Essex, Jeremy G Frey, Rob Gledhill, Stephen W Harris

The CombeChem project has designed and deployed an e-Science infrastructure using a combination of Grid and Semantic Web technologies. In this paper we describe the datagrid element of the project, which provides a platform for sophisticated scientific queries and a rich record of experimental data and its provenance. This datagrid constitutes a significant deployment of Semantic Web technologies and we propose it as an example of a 'Semantic Datagrid'.

Enabling Information Integration and Workflows in a Grid Environment with Automatic Wrapper Generation

Xuan Zhang, Gagan Agrawal

With a growing trend towards grid-based data repositories and data analysis services, scientific data analysis often involves accessing multiple data sources, and analyzing the data using a variety of analysis programs. One critical challenge in this, however, is that data sources often hold the same type of data in a number of different formats, and also, the formats expected and generated by various data analysis services are often distinct.
This paper presents a new approach, which involves generating wrappers automatically for enabling grid-based information integration and workflows. In this approach, a layout descriptor is used for describing the data format for each data source, as well as the input and output format for each tool or service. We demonstrate our wrapper generation tool with two real case studies.

Toward Seamless Grid Data Access: Design and Implementation of GridFTP on .NET

Jun Feng, Lingling Cui, Glenn Wasson, Marty Humphrey

To date, only Linux-/UNIX-based hosts have been participants in the Grid vision for seamless data access, because the necessary Grid data access protocols have not been implemented on Windows. As part of our larger effort at the University of Virginia to make the Windows platform a first-class participant in all aspects of Grids, this paper describes our experiences and lessons learned while implementing GridFTP on the Microsoft .NET Framework. Our implementation not only supports major extensions of GridFTP v1, it also uniquely implements some features of GridFTP v2 and introduces a new transfer mode specifically designed for transfer of large collection of small files. Our measured performance is comparable to GT4 GridFTP on both single and parallel streams transfer and more efficient than GT4 GridFTP on directory tree transfer. We also identify issues specific to the .NET Framework/Windows platform with regard to security and identify limitations of current GridFTP protocol.

Authorization of Data Access in Distributed Storage Systems

Derek Feichtinger, Andreas-Joachim Peters

This paper describes an efficient method for access authorization in distributed (Grid) storage systems. Client applications obtain "access tokens" from an organization's file catalogue upon execution of a file name resolution request. Whenever a client application tries to access the requested files, the token is transparently passed to the target storage system. Thus the storage service can decide on the authorization of a request without itself having to contact the authorization service.
The token is protected from access and modification by external parties using public key infrastructure. A prototype using the AliEn Grid file catalogue and xrootd as a data server has been implemented. A detailed description of the prototype implementation is presented.

Peer-to-Peer Discovery of Computational Resources for Grid Applications

Adeep Singh Cheema, Indranil Gupta, Muhammad Moosa

Grid applications need to discover computational resources quickly, efficiently and scalably, but most importantly in an expressive manner. An expressive query may specify a variety of required metrics for the job, e.g., the number of hosts required, the amount of free CPU required on these hosts, and the minimum amount of RAM required on these hosts, etc. We present a peer-to-peer (p2p) solution to this problem, using structured naming to enable both (1) publishing of information about available computational resources, as well as (2) expressive and efficient querying of such resources. Extensive traces collected from hosts within the Computer Science department at UIUC are used to evaluate our proposed solution. Finally, our solutions are based upon a well known p2p system called Pastry, albeit for Grid applications; this is another step towards the much-needed convergence of Grid and p2p computing.

Grid-Level Computing Needs Pervasive Debugging

Rashid Mehmood, Jon Crowcroft, Steven Hand, Steven Smith

Developing applications for parallel and distributed systems is hard due to their nondeterministic nature; developing debugging tools for such systems and applications is even harder. A number of distributed debugging tools and techniques exist; however, we believe that they lack the infrastructure to scale to large-scale distributed systems, systems with hundreds and thousands of nodes, such as grids. In this paper, we introduce PDB, our prototype debugger, which is based on a hierarchical, scalable architecture. We explain the design of the PDB, highlight its functionality, and demonstrate its usability with two case studies. Before concluding, we discuss portability and extensibility issues for PDB, and discuss some solutions.

A Language-Driven Tool for Fault Injection in Distributed Systems

William Hoarau, Sebastien Tixeuil

In a network consisting of several thousands computers, the occurrence of faults is unavoidable. Being able to test the behavior of a distributed program in an environment where we can control the faults (such as the crash of a process) is an important feature that matters in the deployment of reliable programs. In this paper, we present FAIL (for FAult Injection Language), a language that permits to elaborate complex fault scenarios in a simple way, while relieving the user from writing low level code. Besides, it is possible to construct probabilistic scenarios (for average quantitative tests) or deterministic and reproducible scenarios (for studying the application's behavior in particular cases). We also present FCI, the FAIL Cluster Implementation, that consists of a compiler, a runtime library and a middleware platform for software fault injection in distributed applications. The preliminary tests that we conducted show that its effective impact at runtime is low.

A Scalable and Efficient Self-Organizing Failure Detector for Grid Applications

Yuuki Horita, Kenjiro Taura, Takashi Chikayama

Failure detection and group membership management are basic building blocks for self-repairing systems in distributed environments, which need to be scalable, reliable, and efficient in practice. Besides, now that a great number of available resources are becoming more widely distributed, it is more essential that they can be easily used with less manual configurations in Grid environments, where connectivity between different networks may be limited by firewalls and NATs.
In this paper, we present a scalable failure detection protocol which self-organizes even in Grid environments. Our failure detector autonomously creates dispersed monitoring relations among participating processes so that any process would be monitored by a small number of other processes, and quickly disseminates notification along the monitoring relations if failures are detected. With simulations and real experiments, we showed that our failure detector has high scalability, high reliability, and high efficiency practically.

Reliability-Aware Resource Management for Computational Grid/Cluster Environments

Chokchai Box Leangsuksun

The collective resource utilization achieved through grid computing is critical to the overall computing capacity of the community and should be guaranteed. Especially, in an existing environment where job sites are cluster systems, a service node failure will render the whole system outage. Current grid fault tolerance techniques only address these issues in opportunistic fashion. There is a need for complementing these approaches by proactively handling failures at a job-site level, ensuring the system high availability with no loss of user submitted jobs. We propose a solution dealing with fault tolerance at the service level complementing the task-based solutions in grid-aware-cluster-based enviroments. We discuss various service availability issues related to the grid, some issues and preliminary results obtained while implementing the smart failover feature and the automated grid installation package. Our report entails the performance benefits achieved after implementing our proof-of-concept to enhance HA-OSCAR framework.

Scheduling Independent Tasks Sharing Large Data Distributed with BitTorrent

Baohua Wei, Gilles Fedak, Franck Cappello

Data-centric applications are still a challenging issue for Large Scale Distributed Computing Systems. The emergence of new protocols and softwares for collaborative content distribution over Internet offers a new opportunity for efficient and fast delivery of high volume of data. In a previous paper, we have investigated BitTorrent as a protocol for Data Diffusion in the context of Computational Desktop Grid. We showed that BitTorrent is efficient for large file transfers, scalable when the number of nodes increases but suffers from a high overhead when transmitting small files. This paper proposes modeling enhancements of the BitTorrent protocol to overcome this limitation. We evaluate BitTorrent-aware versions BT-MinMin, BT-MaxMin and BT-Sufferage scheduling heuristics against a synthetic parameter-sweep application.

Automatic Clustering of Grid Nodes

Qiang Xu, Jaspal Subhlok

In a grid-computing environment, resource selection and scheduling depend on the network topology connecting the computation nodes. This paper presents a method to hierarchically group compute nodes distributed across the internet into clusters, and build a logical distance map among clusters. At inter-domain level, distance from landmarks (a small group of distributed reference nodes) is used to map the complex network structure onto a simple geometric space. The position of compute nodes in this geometric space is the basis for partitioning nodes into clusters. For compute nodes within an administrative domain, minimum RTT is used as the metric to partition nodes into clusters. This approach leads to an efficient, scalable and portable method of clustering grid nodes and building a distance map among clusters.

Efficient Response Time Predictions by Exploiting Application and Resource State Similarities

Hui Li, David Groep, Lex Wolters

In this paper we propose an Instance Based Learning technique to predict application response times on clusters by mining historical workloads. The novelty of our approach is to introduce policy attributes in representing and comparing resource states, which is defined as the pool of running and queued jobs on the resource at the time to make a prediction. The policy attributes reflect the local resource scheduling policies and they can be automatically discovered by genetic search. The main advantages of this approach compared with scheduler simulation are two-folds: Firstly, it has a better performance to meet the real time requirement of Grid resource brokering; secondly, it is more general because the scheduling policies are learned from past observations. Our experimental results on the NIKHEF LCG production cluster show that acceptable prediction accuracy can be obtained, where the relative prediction errors for response times are between 0.35 and 0.70.

A Quantitative Comparison of Reputation Systems in the Grid

Jason David Sonnek, Jon B. Weissman

Reputation systems have been a hot topic in the peer-to-peer community for several years. In a services-oriented distributed computing environment like the Grid, reputation systems can be utilized by clients to select between competing service providers. In this paper, we selected several existing reputation algorithms and adapted them to the problem of service selection in a Grid-like environment. We also proposed a new reputation algorithm. We performed a quantitative comparison of both the accuracy and overhead associated with these techniques under common scenarios. The results indicate that using a reputation system to guide service selection can significantly improve client satisfaction with minimal overhead, and the most appropriate algorithm depends of the kinds of anticipated "attacks". Our proposed new algorithm appears to be the approach of choice if clients can misreport service ratings.

Poster Abstracts

Differential Checkpointing for Reducing Memory Requirements in Optimized SOAP Deserialization

Nayef Abu-Ghazaleh, Michael J. Lewis

Differential Deserialization (DDS) is a SOAP optimization technique wherein servers save checkpoints and parser states associated with portions of previously received messages, and use them to avoid full parsing and deserialization of similar new messages. In this paper, we characterize DDS's memory requirements and memory overhead, introduce a new techniques for storing only the differences between successive parser states for a message, and demonstrate how this optimization, which we call differential checkpointing, speeds up the DDS optimization and reduces its memory requirements.

Grid Applications for High Energy Physics Experiments

T. Adye, D. Antonioli, R. Barlow, B. Bense, D. Boutigny, C. Bozzi, C.A.J. Brew, R.D. Cowles, E. Feltresi, A. Forti, G. Grosdidier, A. Khan, H. Lacker, E. Luppi, R.K. Mommsen, A. Petzold, D. Smith, J.E. Sundermann, P. Veronesi, F. Wilson, J.C Werner

This paper discusses the use of e-Science Grid in providing computational resources for modern international High Energy Physics (HEP) experiments. We investigate the suitability of the current generation of Grid software to provide the necessary resources to perform large-scale simulation of the experiment and analysis of data in the context of multinational collaboration.

SERVOGrid Complexity Computational Environments (CCE) Integrated Performance Analysis

Galip Aydin, Mehmet S. Aktas, Geoffrey C. Fox, Harshawardhan Gadgil, Marlon Pierce, Ahmet Sayar

In this paper we describe the architecture and initial performance analysis results of the SERVOGrid -Complexity Computational Environments (CCE). The CCE architecture is based on a lightly coupled, Service Oriented Architecture approach that is suitable for distributed applications that are tolerant of Internet latencies. CCE focuses on integrating diverse Web and Grid Services for coupling scientific applications to Geographical Information systems. The services and coupling/orchestrating infrastructure are mapped to problems in geophysical data mining, pattern informatics, and multiscale geophysical simulation.

Web Services and Grid Security Vulnerabilities and Threats Analysis and Model

Yuri Demchenko, Leon Gommans, Cees de Laat, Bas Oudenaarder

The paper provides an overview of available web applications and Web Services security vulnerability models and proposes a classification of the potential Grid and Web Services attacks and vulnerabilities. This is further used to introduce a security model for interacting Grid and Web Services that illustrates how basic security services should interact to provide an attack-resilient multilayer protection in a typical service-oriented architecture. The analysis and the model can be used as a basis for developing countermeasures against known vulnerabilities and proposing security services design recommendations. The paper refers to the ongoing work on middleware and operational security in the framework of the European Grid infrastructure deployment project EGEE and related coordination groups.

Auto-Adaptative Distributed Hash Tables

Arnaud Dury

In this paper we propose a new Distributed Hash Table (DHT) model called Auto-Adaptative Distributed Hash Table. Our model uses a distributed profiling of the nodes of the DHT to dynamically adapt the size of the index tables in order to reduce both the message consumption and the request latency. This work is an evolution of our architecture for a distributed computing model over a DHT that we described in Dury04. We detail our auto-adaptative model, the protocols we implemented and tested and we give experimental results and theoretical modelization of our architecture in simulated networks of up to 640 nodes. We conlude with a discussion of the security of our architecture and of the possible use of the dynamic profiling for other distributed computing purposes.

Legacy Code Support for Production Grids

Tamas Kiss, Gabor Terstyanszky, Gabor Kecskemeti, Szabolcs Illes, Thierry Delaitre, Stephen Winter, Peter Kacsuk, Gergely Sipos

In order to improve reliability and to deal with the high complexity of existing middleware solutions, today's production Grid systems restrict the services to be deployed on their resources. On the other hand end-users require a wide range of value added services to fully utilize these resources. This paper describes a solution how legacy code support is offered as third party service for production Grids. The introduced solution, based on the Grid Execution Management for Legacy Code Architecture (GEMLCA), do not require the deployment of additional applications on the Grid resources, or any extra effort from Grid system administrators. The implemented solution was successfully connected to and demonstrated on the UK National Grid Service.

Generic Application Description Model: Toward Automatic Deployment of Applications on Computational Grids

Sébastien Lacour, Christian Pérez, Thierry Priol

Computational grids promise to deliver a huge computer power as transparently as the electric power grid supplies electricity. Thus, applications need to be automatically deployed on computational grids. However, various types of applications may be run on a grid (component-based, MPI, etc.), so it may not be wise to design an automatic deployment tool for each specific programming model.
This paper promotes a generic application description model which can express several specific application descriptions. Translating a specific application description into our generic description is a simple task. Then, developing new planning algorithms and re-using them for different application types will be much easier. Moreover, our generic description model allows to deploy applications based on a programming model combining several models, as parallel components encompass component-based and parallel programming models for instance. Our generic description model is implemented in an automatic deployment tool which can deploy CCM and MPICH-G2 applications.

Semantic Overlay Network for Grid Resource Discovery

Juan Li, Son Vuong

Grid technologies enable the sharing and collaborating of wide variety of resources. To fully utilize these resources, effective resource discovery mechanisms are necessities. However, the complicated and dynamic characteristics of the grid resource make sharing and discovering a challenging issue. In this paper we propose a peer-to-peer (P2P) based overlay network to assist the efficient resource discovery and query. The framework is based on the RDF metadata infrastructure, allowing a rich and extensible description of resources. To avoid flooding the network with a query, we propose a comprehensive semantics-based query forwarding strategy, which only forwards query to semantically related nodes. After the related nodes have been located, the original RDF query is used to do the final query and retrieval. Results from simulation experiments demonstrate that this architecture is scalable and efficient.

Saleve: Simple Web-Services Based Environment for Parameter Study Applications

Zsolt Molnár, Imre Szeberényi

The goal of the Saleve Project is to develop and evaluate mechanisms and abstractions that may connect the diverse research community of the distributed and Grid computing to those users, who are not familiar with distributed computing as such, but who would simply like to use the results in their everyday tasks. We show a simple web-services based, domain-specific computational framework that integrates smoothly into the well-known, traditional user environments, requires learning no new technologies, and brings the power of the Grid directly to the desktop of the end user.

Efficient Mutual Exclusion in Peer-to-Peer Systems

Moosa Muhammad, Adeep Cheema, Indranil Gupta

Traditional peer-to-peer (p2p) applications such as Kazaa and Gnutella have been primarily used for sharing read-only files (such as mpegs and mp3s). Due to a recent surge in the area of Grid computing, there is an urgency to find efficient ways of protecting consistent and concurrent access to shared resources. This paper introduces two novel protocols for achieving mutual exclusion efficiently in dynamic p2p systems. The protocols are layered atop a distributed hash table (DHT), making them scalable and fault-tolerant. The burden of controlling access to the critical section is also evenly distributed among all the nodes in the network, making the protocols more distributed and easily adaptable to growing networks.
We present experiments comparing our implementations with existing mutual exclusion algorithms. The significant reduction in overall message overhead and better load-balancing mechanisms makes the proposed protocols very attractive in being used for current and future p2p and Grid applications.

Comparison of End-to-end Bandwidth Measurement Tools on the 10GigE TeraGrid Backbone

Margaret Murray, Shava Smallen, Omid Khalili, Martin Swany

Both network managers and grid application users need to maximize the bandwidth utilization of distributed applications in the face of complex interactions between network and system hardware and software along the end-to-end paths. Several software tools exist that attempt to unobtrusively measure end-to-end available bandwidth. We present results of the first study to compare these tools on a 10GigE network backbone. We use the Inca test harness deployed on the NSF TeraGrid to collect periodic measurements from a fully connected mesh of node pairs on end-to-end paths between eight TeraGrid sites. We compare results from (1) Network Weather Service (NWS); (2) pathchirp; and (3) pathload. We analyze collected data to determine tools accuracy and efficiency. Finally we discuss the possible use of bandwidth measurement tools for selecting distributed resources or scheduling jobs.

QoS-Driven Service Configuration in Computational Grids

Sharath Babu Musunoori, Frank Eliassen, Viktor S. Wold Eide

Computational grids promise to provide easy-to-use infrastructures for distributed systems. For real-time distributed applications, it is the quality of service (QoS) which decides their performance. Current existing grid solutions do not support QoS issues such as QoS specification and management, and are limited in performance optimization. Addressing these limitations, we use the platform managed QoS-aware service configuration approach. This approach enables application developers to separate functional specification and QoS requirements from implementation decisions that depend on the deployment environment. The middleware platform is responsible for achieving QoS goals of grid application service configuration at deployment time. We refer to this as service planning. In this paper we present a service planning framework to achieve QoS demands as performance objectives of the real-time multimedia application services. We will also present a simple quality deviation model, a service planning algorithm to search for QoS-tradeoff points in making configuration decisions. This model is an improvement of a common optimization solution technique used for QoS management.

HPC-Europa: Towards Uniform Access to European HPC Infrastructures

Ariel Oleksiak, Alisdair Tullo, Paul Graham, Tomasz Kuczynski, Jarek Nabrzyski, Dawid Szejnfeld, Terry Sloan

One of goals of the HPC-Europa project is to provide users with the Single Point of Access (SPA) to the resources of HPC centers in Europe. To this end, the HPC-Europa Portal is being built to provide transparent and uniform user access to HPC-Europa resources. This portal will hide the underlying complexity and heterogeneity of these resources and the access to them.
In this paper, we present a mechanism for enabling end-users to transparently access services available in the HPC-Europa environment. We also describe the architecture of the SPA based on the GridSphere portal framework. The uniform job submission interface that uses this mechanism and is based on the Job Specification Description Language (JSDL) is also presented. Finally we discuss the various interoperability problems in particular those concerning job submission, security and accounting.

A Self-Organized Grouping (SOG) Method for Efficient Grid Resource Discovery

Anand Padmanabhan, Shaowen Wang, Sukumar Ghosh, Ransom Briggs

This paper presents a self-organized grouping (SOG) method that achieves efficient Grid resource discovery by forming and maintaining autonomous resource groups. Each group dynamically aggregates a set of resources that are similar to each other in some pre-specified resource characteristic. The SOG method takes advantage of the strengths of both centralized and decentralized approaches that were previously developed for Grid/P2P resource discovery. The design of the SOG method minimizes the overhead incurred in forming and maintaining groups and maximizes resource discovery performance. The way SOG method handles resource discovery queries is metaphorically similar to searching for a word in an English dictionary by identifying its alphabetical groups at the first place. It is shown from a series of computational experiments that SOG method achieves more stable (i.e., independent of the factors such as resource densities, and Grid sizes) and efficient lookup performance than other existing approaches.

Web-Enabled Grid Authentication in a Non-Kerberos Environment

John-Paul Robinson, Jill Gemmill, Pravin Joshi, Purushotham Bangalore, Yiyi Chen, Silbia Peechakara, Song Zhou, Prahalad Achutharao

UABgrid is a collaboration between academic and administrative IT units at the University of Alabama at Birmingham (UAB). UABgrid provides a web-based grid client environment, access to shared campus computational resources, and user identities defined by the authoritative campus identity provider. A weblogin service leveraging UAB's authoritative identity directory is provided for grid authentication. Previous integrations of institutional identity management and grid authentication depended on a Kerberos environment and use of KX.509. We accomplish similar functionality in a non-Kerberos environment by leveraging our weblogin service to drive applications which require grid credentials. The UABgrid registration process employs the weblogin service to generate certificates and keys signed by our UABgridCA and automatically provisions accounts for UABgrid users based on resource center policies. After successful registration, UABgrid leverages the weblogin service to allow users to access resources and to submit jobs using only a web browser and their familiar username and password.

Addressing Credential Revocation in Grid Environments

Babu Sundaram, Barbara M Chapman

Credential revocation is a critical problem in grid environments and remains unaddressed in existing grid security solutions. We emphasize the importance of credential revocation in grids and present a novel grid authentication solution to the revocation problem. Our model supports instantaneous revocation of both long-term digital identities of hosts/users and short-lived identities of user proxies. With our approach, revocation information is guaranteed to be fresh with high time-granularity. This solution uses \emph{mediated RSA} (mRSA), adapts Boneh's notion of "semi-trusted mediators" to suit security in virtual organizations and propagates user proxy revocation information as in Micali's NOVOMODO system. We also show how to achieve a configuration-free security model for end-users of the grid and fine-grained management of users' delegation capabilities.

Bridging Organizational Network Boundaries on the Grid

Jefferson Tan, David Abramson, Colin Enticott

The Grid offers significant opportunities for performing wide area distributed computing, allowing multiple organizations to collaborate and build dynamic and flexible virtual organisations. However, existing security firewalls often diminish the level of collaboration that is possible, and current Grid middleware often assumes that there are no restrictions on the type of communication that is allowed. Accordingly, a number of collaborations have failed because the member sites have different and conflicting security policies. In this paper we present an architecture that facilitates inter-organization communication using existing Grid middleware, without compromising the security policies in place at each of the participating sites. Our solutions are built on a number of standard secure communication protocols such as SSH and SOCKS. We call this architecture Remus, and will demonstrate its effectiveness using the Nimrod/G tools.

LASSO: A Grid-Enabled Simulation Optimization Framework

Michael Tryby, Baha Mirghani, Ranji Ranjithan, Kumar Mahinthakumar, Derek Baessler, Nicholas Karonis

In this paper, we report our experiences developing a grid enabled framework for solving environmental characterization problems. Environmental characterization involves the resolution of unknown system characteristics from observation data, and thus can be categorized as an inverse problem. The solution approach taken here couples environmental simulation models with global search methods and requires the readily available computational resources of the grid for computational tractability. We develop a simple application architecture which utilizes standard communications protocols and the MPI2 API to establish a connection between a centralized search application and forward models running on TeraGrid resources. We report on a preliminary set of results for a ground water release history reconstruction problem where we observe significant raw performance improvements.

Contact

For further information on Grid 2005, please contact the Program Chair: Daniel S. Katz

This page is maintained by Daniel S. Katz