E D R S I H C RSS
ID
Password
Join
모든 성공한 자들의 하인이며, 모든 실패한 자들의 주인인 자. 그것은 습관이다. -- 아무개

 * 원문링크 : [http]http://www.spread.org/docs/guide/ 의 사용자가이드에서 "1장: Introduction to Spread" 부분

Contents

1 What is Spread?
2 Design Issues
2.1 Comparison with reliable IP-multicast
2.2 Flexibility of services
2.3 Modularity of Spread architecture
3 Spread Guarantees
3.1 Ordering
3.2 Reliability
4 Additional Information

1 What is Spread? #

When designing distributed applications one must make a number of architectural choices. These choices include how communication between applications will be handled, what the roles of each process will be, how dependent each machine is on the others for operation of the application, etc. Part of what makes creating reliable, high-performance, useful distributed applications hard is the number of fundamental choices one must make and the complex interactions between each choice.

The group communication model is a framework that provides both a physical toolkit upon which to build and a model which limits the number of choices that must be met. This model simplifies the task of constructing a reliable, correct distributed application while still giving the user a powerful set of abstractions upon which many different distributed applications can be built. It is certainly true that not every application can be built using the group communication model, and even if it could the negative characteristics of the model can make group communications a bad choice. What group communications does do, however, is make a large number of distributed applications easier to build and more powerful. It is no different than any other higher level abstraction. For example, one could build every network application by creating IP level packets by hand, having the application provide packet checksums, multiplexing, reliability, ordering, and flow control, but everyone realizes that although that is the most powerful approach (and it is used for some specialized applications) in almost every case you want to use a high level API like sockets and an established network protocol like TCP. The basic services provided by most group communication systems are:
  1. Abstraction of a Group (a name representing a set of processes, all of whom receive any messages sent to the Group).
  2. Multicast of messages to a Group.
  3. Membership of a Group.
  4. Reliable messages to a Group.
  5. Ordering of messages sent to a Group.
  6. Failure detection of members of the Group.
  7. A strong semantic model of how messages are handled when changes to the Group membership occur.

It should be obvious that the name “Group Communications System” is very appropriate, as the concept of a “Group” is the fundamental abstraction of the system. Once you have that abstraction all the other services make sense: knowing who is in the group, talking to the group, knowing when someone leaves the group, agreeing on an ordering of events in the group.

Here are a few distinct example applications that exhibit how the group communication model provides a useful abstraction for a wide variety of distributed applications.
  • Service and machine monitoring. A number of machines export their status to groups of interested monitors. Whenever failure occurs the monitors are notified.
  • Collaborative tools. Many different groups of participants each want to share data, video and audio conferencing.
  • DSM (Distributed Shared Memory). Sending pages of memory to machines where it is needed using reliable multicast.
  • Highly reliable services (such as air traffic control systems, stock exchanges, military tracking and combat control systems). Services that involve communication of information among numerous machines and people and have high requirements for both availability and fault-tolerance.
  • Replicated databases. A number of instances of a database exist in several different locations. They must all be kept synchronized in such a way that a client can query or update any of them and the results will be the same as if only one copy existed.

2 Design Issues #

2.1 Comparison with reliable IP-multicast #

The service provided by Spread and the service provided by many reliable IPmulticast protocols have some features in common and some differing semantics. The main area of overlap is that they both solve the problem of getting best-effort reliability when sending multicast messages for small to medium sized groups. The key difference is that most reliable IP-multicast protocols aim to be also solve that problem for very large groups, while Spread does not support very large groups, but does provide a stronger model of reliability and additional service such as ordering.

A practical difference is that reliable IP-multicast usually relies on a wide area IP-multicast network (such as the mbone, or ISP support for multicast routing) while Spread only relies on point-to-point unicast IP support, and uses IP multicast only as a performance optimization. One subtle distinction between reliable IP-multicast and Spread's Reliable service is that Spread integrates a membership notification service into the stream of messages. The membership notifications provide some knowledge of who actually received the reliable messages. The issue of membership is a key distinction between the unicast, or point-to-point world of TCP/IP and multicast services. In multicast it is often necessary to know “with whom” you are reliably communicating since there is no obvious 'other party' as in unicast.

2.2 Flexibility of services #

The key question is at what level of granularity do you define services? GCS allow a number of different levels of service and the application only pays for those that it needs (to a large degree). The GCS primitives are very flexible and many different applications can use them in different ways. The goal of Spread (not necessarily all GCS) is to support the broad middle of applications. This includes those that need more than unreliable multicast or multicast to millions of users, but don’t have extremely specialized needs such as hard real-time requirements, hardware fault-tolerance, or esoteric reliability and semantic models. The ideas of GCS have been extended to some of these extremes (especially real-time and hardware assisted fault-tolerance), and have influenced to a small degree the solutions being proposed to reliable multicast to millions of users.

A number of people assert that it is an accepted truth that no one system or protocol will work for all cases. This is essentially a truism. However, they often mean by this that NO system or protocol will be very good for more then one very narrow set of needs, and thus no one should even try to create a system to support many different families of applications. I believe that to be a false assumption because I have seen all of the applications listed above built using one group communication system. All the applications built in this way have fulfilled their requirements and performed well. That is not to say they could not have been built in other ways that might work even better, but they did everything they needed to and because of the standard abstractions and the support of an existing toolkit were able to be built much faster and with more reliability. In essence, the costs of custom designing the services that they needed instead of using the existing group communication abstractions and services would have been much higher and the performance payoff would not be enough to overcome that.

2.3 Modularity of Spread architecture #

Spread is designed to be modular in two ways. First, at the network communication level, Spread supports multiple link protocols. Second, Spread supports multiple client interfaces.

Group communication toolkits can be used in many different environments with very different network infrastructures. These different networks can have very different characteristics (latency, bandwidth, shared/point-to-point, native multicast, routed). Spread has a modular API for link protocols that allow different protocols to be used for dissemination, reliability, and flow control without changing the upper layer protocols at all. For example, Spread currently has three link protocols implemented in the base system. The first is called the Ring protocol and it provides high throughput when used on a low latency local area network of no more then about 30 daemons. The second uses TCP for transport and provides stable transport over wide area networks in a point-to-point manner. The third is called Hop and like TCP is used to cross wide area networks with high latency and non-negligible loss, but it provides higher throughput and lower message latency than TCP, and is more stable in high loss situations. The client interfaces provided with Spread include native interfaces for Java and C, and a Perl library that wraps the C interface. These interfaces are designed to be consistent with the language’s normal idioms. Spread natively only provides a toolkit level abstraction of group communication services. Higher level group tools such as replication tools, wrappers of native networking interfaces, and client specific tools can be implemented on top of the toolkit APIs. One detail is that Spread natively supports the Extended Virtual Synchrony(EVS) model (more details on this are later), however another similar model is also very common, the Virtual or View Synchrony model. To support either, Spread provides a special client library which implements View Synchrony on top of Spread's native EVS model.

3 Spread Guarantees #

Spread provides several different types of messaging services to applications. In addition to being able to send messages to entire groups of recipients and receiving membership information about who is currently alive and reachable, Spread provides both ordering and reliability guarantees.

When an application sends a Spread message it chooses a level of service for that message. The level of service selected controls what kind of ordering and reliability are provided to that message. The application can choose a different level of service for each message that it sends. Spread supports 5 different levels of service. Table 1.1 shows the different types and what kind of ordering and reliability guarantees they provide.

  • Table 1.1: Spread Message Service Types
    Spread Service TypeOrderingReliability
    UNRELIABLE_MESSNoneUnreliable
    RELIABLE_MESSNoneReliable
    FIFO_MESSFifo by SenderReliable
    CAUSAL_MESSCausal (Lamport)Reliable
    AGREED_MESSTotal Order (Consistent w/Causal)Reliable
    SAFE_MESSTotal OrderSafe

3.1 Ordering #

The ordering guarantees defined by Spread are:
  • None : No ordering guarantee. Any other message also sent with ordering “None”can arrive either before or after this one. Messages with stricter ordering CAN depend on this message. For example, if a FIFO MESS message Ma follows RELIABLE MESS message Mb then Ma cannot be delivered until Ma has been delivered (but the reverse is not true).
  • Fifo by Sender : All messages sent by this sender(A sender is defined as a particular connection to a Spread daemon, so an application with 3 connections will be considered 3 different senders) of at least Fifo ordering are delivered in FIFO order. As mentioned above a RELIABLE MESS sent after a Fifo message may be delivered before the Fifo message.
  • Causal (Lamport) : All messages sent by all senders are delivered in an order consistent with Lamport’s definition of “Causal” order. This order is consistent with Fifo ordering.
  • Total Order (Consistent w/Causal) : All messages sent by all senders are delivered in the exact same order to all recipients. This order is also consistent with Causal order. It is provided by making the partial order defined by causal into a total order. The total order uses the id of the sender to break ties.

It is important to note that messages sent with Fifo ordering or less do not support the full membership semantics of Spread . This is a result of Spread optimizing two common operations, group joins and leaves and sending FIFO or Reliable messages. First, joins and leaves of group members do not cost more then sending one SAFE message and result in no extra synchronization costs. Second, Fifo and Reliable messages are not delayed before delivery by any other messages. So even if gaps exist in the global order of all messages, Reliable messages can still be delivered and Fifo messages can be delivered as long as all the messages from their sender have arrived. Because of these two optimizations, it is possible for a Reliable or Fifo message to be delivered earlier then it would be if it was globally ordered, however a gap in the global sequence may contain a join or leave message (since they are just SAFE messages) so it might be that one process delivers the Fifo or Reliable message before the join and a different process delivers the join first and then the message.

3.2 Reliability #

The Reliability guarantees defined by Spread are:
  • Unreliable : The message is unreliable. It may be dropped or lost and will not be recovered by Spread.
  • Reliable : The message will be reliably delivered to all recipients who are members of the group to which the message was sent. Spread will recover the message to overcome any network losses.
  • Safe : The message will ONLY be delivered to a recipient if the daemon that recipient is connected to knows that all Spread daemons have the message. If a membership change occurs, and as a result the daemon cannot determine whether all daemons in the old membership have the message, then the daemon will deliver the Safe message after a TRANSITIONAL MEMBERSHIP message.

4 Additional Information #

Spread is actively developed by the Center for Networking and Distributed Systems at Johns Hopkins University. The software, documentation, community of users, and additional applications are constantly being improved and evolving. The best way to find out what is currently going on, or learn more about the Spread system is to check out our web sites: A number of research papers have been published on the Spread system and related projects. A complete list can be found on the web.

Valid XHTML 1.0! Valid CSS! powered by MoniWiki
last modified 2010-10-28 12:42:54
Processing time 0.4962 sec