A Cost-Benefit Approach to Fault Tolerant Communication and Information Access
Yearly Technical Report, July 2003
Objective:Our goal is to develop a Cost-Benefit framework for fault tolerant communication and information access that addresses extremely powerful adversaries that were never handled in the past. The project will develop the theory and algorithms required to overcome strong network attacks, while providing theoretically provable performance bounds. We will build a system that incorporates these algorithms, and that exhibits good performance in practice.
Approach:Our technical approach includes the following innovative topics:
Overlay network infrastructureIn order to better analyze and understand the overlay networks paradigm in an environment defined by weaker semantics, we developed a stand alone prototype called Spines using the client-daemon architecture that is able to build and automatically configure a dynamic overlay network. Our Overlay Network aims to be very scalable, as it does not have any limitation in number of nodes or links, other than what the routing protocol used can support.
We focused our work on the client-daemon communication in Spines. We develpoped a socket-like API for unreliable, best-effort UDP communication, and also for the session-based TCP reliable communication. This brings us the level of transparency necessary for making Spines easy to use in the socket programming paradigm, in a first step towoards complete transparency. We also analyzed the possibilities of poviding IP multicast service using Spines while using only simple unicast communication at the network level.
We developed an end-to-end reliability over our hop-by-hop reliability approach. We have a complete socket capability, similar to a TCP socket that flows over the overlay end-to-end. As a by product of our approach, we can now provide a TCP-fair implementation of an efficient user-level reliable protocol.
We demonstrated that employing hop-by-hop reliability techniques considerably reduces the average latency and jitter of reliable communication while still being fair with external Internet traffic. In order to deploy our protocols over the Internet we considered networking aspects such as congestion control, internal and external fairness, flow control and end-to-end reliability.
We showed that the benefit of hop-by-hop reliability greatly overcomes the overhead associated with reliable overlay routing given by factors such as processing overhead and CPU scheduling, and achieves much better performance compared to standard end-to-end TCP connections deployed on the same overlay network.
We designed a framework for application level, transparent reliable multicast using the hop-by-hop reliability in Spines. The framework includes end-to-end reliablility, congestion and flow control, and relaxed semantics over reliable multicast that handle partitions, merges, crashes and recoveries. We started the implementation of this framework in our overlay infrastructure.
We investigated some of the survivability aspects of Spines, both in wireless and wired environments. We developed a mechanism of trust based on monitoring the abnormal behaviour of overlay nodes, and an acusation system that would eventually reroute packets to avoid untrusted nodes. We released the first version of Spines (www.spines.org) under a standard BSD licence.
We implemented best-effort multicast in Spines, with an interface that resembles the standard IP Multicast service. Our preliminary tests show that Spines is very scalable with the number of senders, receivers and groups. We plan to release a new version of Spines that incorporates best effort multicast soon.
New replication protocolWe continued to work on optimizing and evaluating the replication architecture. We discovered and corrected several performance issues with the engine itself and designed a significant latency optimization to Safe messages in the Spread Toolkit that improved the performance of the replication system as a whole. A complete replicated database solution for the PostgreSQL database was produced and formed the basic version upon which we ran experiments.
We have benchmarked the replication server with and without the PostgreSQL database both in a local area cluster and on general wide area networks using the Emulab facility hosted by the University of Utah.
We were able to obtain performance results that show the efficiency of our replication architecture due to the use of an enhanced synchronization algorithm. We show that latency is not a limiting factor in attaining high throughput in wide-area network environments. We are able to sustain similar aggregate throughput on both local area and wide-area setups, outperforming existing synchronous replication solutions and providing grounds for a wide range of applications to adopt replication as a measure for fault tolerance and high availability.
WackamoleWe have developed and released Wackamole, a software tool that allows N-Way Fail Over for IP Addresses in a cluster.
Wackamole is a tool that helps with making a cluster highly available. It manages a bunch of virtual IPs that should be available to the outside world at all times. Wackamole ensures that exactly one machine within the cluster is listening on each virtual IP address that Wackamole manages. If it discovers that particular machines within the cluster are not alive, it will almost immediately ensure that other machines acquire the virtual IP addresses the down machines were managing. At no time will more than one connected machine be responsible for any virtual IP.
Wackamole also works toward achieving a balanced distribution of the public IPs within the cluster it manages.
Wackamole uses the membership notifications provided by the Spread Toolkit , also developed in our lab, to generate a consistent state that is agreed upon among all of the connected Wackamole instances. Wackamole uses this knowledge to ensure that all of the public IP addresses served by the cluster will be covered by exactly one Wackamole instance. We have designed and formally proven the correctness of the algorithm used by Wackamole.
Wackamole now supports four platforms, Linux, FreeBSD, Solaris 8, and Mac OSX. Development has also focused on making Wackamole more robust and fixing deployment issues we received from users. Based on email queries and downloads Wackamole has started to make an impact as a different model for IP failover for clusters and to be used in practice.
Software:We have released version 1.0 of Spines, an Overlay Network Research Platform. Version 1.0 supports unicast best effort and reliable communication with an interface similar with the Unix socket interface. Spines is available at www.spines.org
On November 15, 2002 we released version 2.0.0 of Wackamole, an NxWay fail-over for IP addresses in a cluster. Version 2.0.0 supports the Linux, FreeBSD, Solaris 8, and Mac OSX operating systems. Wackmole is available at www.wackamole.org.
Both Spines and Wackamole project have experienced a steady stream of downloads from our website including commercial, individual, and academic users over the last year. All in all we registered 60 distinct downloads for Spines and over 1000 downloads of Wackamole. We know of several organizations that use Wackamole in production both as NxWay failover for servers and as NxWay failover for routers (which is interesting because we never thought about it ourselves).
The OASIS Dem/Val project used Wackamole to provide failover for edge routers. The same project also used the replication technology to maintain consistent state among different wide area sites. The replication technology is currently evaluated by the Future Combat System (FCS) project.