A Cost-Benefit Approach to Fault Tolerant Communication and Information Access
Yearly Technical Report, July 2002
Objective:Our goal is to develop a Cost-Benefit framework for fault tolerant communication and information access that addresses extremely powerful adversaries that were never handled in the past. The project will develop the theory and algorithms required to overcome strong network attacks, while providing theoretically provable performance bounds. We will build a system that incorporates these algorithms, and that exhibits good performance in practice.
Approach:Our technical approach includes the following innovative topics:
Overlay network infrastructureIn order to better analyze and understand the overlay networks paradigm in an environment defined by weaker semantics, we developed a stand alone prototyple called "Spines" using the client-daemon architecture that is able to build and automatically configure a dynamic overlay network. Our Overlay Network aims to be very scalable, as it does not have any limitation in number of nodes or links, other than what the routing protocol used can support.
In the current implementation we provide only unreliable, best effort semantics, similarly with UDP. The overlay networks configures itself automatically, and dynamically grows or shrinks as nodes decide to join or leave the network, and supports partitions, merges, crashes and recoveries, and any such cascading events. Applications that use the overlay network use a simple API consisting in four calls (that provide connect, disconnect, send and receive), very similar to UDP socket functions.
We continued developing Spines by adding reliablility semantics, both hop by hop and end to end, similar to TCP. We showed in simulations how using hop by hop reliablility in an overlay improves the performance in terms of latency of point to point TCP connections.
We implemented the first mechanisms leading us to support multipath routing in Spines. These include the detection of the level of link congestion, necessary for a TCP-friendly pricing mechanism in the Cost-Benefit framework.
Cost benefit decision makingWe have rewritten our global flow control in the ns2 simulation. The new implementation scales much better with the number of sites. We are now able to simulate our cost-benefit flow control in network scenarios with up to 1600 sites, 800 different senders sending to several groups each, out of 800 different groups in the system. This gives us the ability to simulate very large overlay networks.
New replication protocolWe continued to work on optimizing and evaluating the replication architecture. We discovered and corrected several performance issues with the engine itself and designed a significant latency optimization to Safe messages in the Spread Toolkit that improved the performance of the replication system as a whole. A complete replicated database solution for the PostgreSQL database was produced and formed the basic version upon which we ran experiments.
We have started to experiment with the replication server we developed. We are now benchmarking the replication server with and without the Postgres database on a local area cluster located in our lab, on the CAIRN wide area network and on general wide area networks using the Emulab facility hosted by the University of Utah.
We have completed a full set of experiments on local and wide area networks. We were able to accurately emulate the physical topology of the CAIRN network on the Emulab machines. The Emulab machines have processing and disk IO resources comperable to those of our local cluster and we were able to get excellent results for the replication engine that showed the efficiency of the replication architecture and the practical capibility for wide area database replication.
With the purpose of building a framework that will allow us to clearly identify the tradeoffs involved when replicating databases on wide area networks, we developed a more modular version of the replication algorithm (Maintaining Database Consistency in P2P Networks). We are investigating a new metric that will allow us to quantify the opportunity of establishing new replicas into a replicated system. We are also studying the possibility of enhancing the current replication schemes in order to increase their fault tolerance and scalability properties, in the context of dynamic networks.
WackamoleWe have developed and released Wackamole, a software tool that allows N-Way Fail Over for IP Addresses in a cluster.
Wackamole is a tool that helps with making a cluster highly available. It manages a bunch of virtual IPs that should be available to the outside world at all times. Wackamole ensures that exactly one machine within the cluster is listening on each virtual IP address that Wackamole manages. If it discovers that particular machines within the cluster are not alive, it will almost immediately ensure that other machines acquire the virtual IP addresses the down machines were managing. At no time will more than one connected machine be responsible for any virtual IP.
Wackamole also works toward achieving a balanced distribution of the public IPs within the cluster it manages.
Wackamole uses the membership notifications provided by the Spread Toolkit , also developed in our lab, to generate a consistent state that is agreed upon among all of the connected Wackamole instances. Wackamole uses this knowledge to ensure that all of the public IP addresses served by the cluster will be covered by exactly one Wackamole instance.
Wackamole now supports four platforms, Linux, FreeBSD, Solaris 8, and Mac OSX. Development has also focused on making Wackamole more robust and fixing deployment issues we received from users. Based on email queries and downloads Wackamole has started to make an impact as a different model for IP failover for clusters and to be used in practice.
ArchipelagoWe have developed an initial version of the Archipelago system. The system allows us to investigate efficient ways to form an extended ad-hoc network of laptops, handhelds, and other wireless capable devices, and bridge it to the Internet. Archipelago constructs a multi-hop dynamic network using the wireless devices of participating users. The current system is fully operational, capable of supporting up to about fifty participants using handhelds (Windows CE) and laptops (Windows and Linux), and up to 10 hops in network diameter.
Later in the year we developed a third generation version of the Archipelago system. This version completely reimplements the system with a modular design that allows pluggable protocols and services such as routing, transport, and security. This will allow us to use Archipelago as a flexible platform for experimentation with specialized routing protocols and the cost-benefit framework. It also allows us to use it in non-wireless, or hybrid wired-wireless environments.
Current Plans:Our plan for FY 2003 includes the following:
We have released version 1.0.0 of Wackamole, an NxWay fail-over for IP addresses in a cluster on August 2001. Version 1.0.0 supports the Linux operating system. Wackmole is available at www.backhand.org/wackamole.
On November 5, 2001 we released version 1.2.0 of Wackamole, an NxWay fail-over for IP addresses in a cluster. Version 1.2.0 supports the Linux, FreeBSD, Solaris 8, and Mac OSX operating systems. Wackmole is available at www.backhand.org/wackamole.
On December 9, 2001 we released version 1.2.1 of mod_backhand, an Apache web-server module that enables cluster management and request load balancing and control for heterogeneous clusters. This version now supports Windows NT as well as several versions of Unix. More information about mod_backhand is available at www.backhand.org/mod_backhand.
The Wackamole project has experienced a steady stream of downloads from our website including commercial, individual, and academic users over the last year. All in all we registered 550 distinct downloads of the software. We know of several organizations that use it in production both as NxWay failover for servers and as NxWay failover for routers (which is interesting because we never thought about it ourselves).