DC-Chapter-2 : Interprocess Communication



2.1 Introduction

The backbone of distributed computing is interprocess communications (IPC): the ability for separate, independent processes to communicate among themselves to collaborate on a task.

Figure 2.1 illustrates basic IPC: Two independent processes, possibly running on separate machines, exchange data over the interconnecting network. In this case, process 1 acts as the sender, which transmits data to process 2, the receiver.



Fig 2.1 Interprocess Communication

In distributed computing, two or more processes engage in IPC in a protocol - a set of rules that must be observed by the participants in data communication agreed upon by the processes. A process may be a sender at some points during a protocol, a receiver at other points.

When communication is from one process to a single other process, the IPC is said to be a unicast. When communication is from one process to a group of processes, the IPC is said to be a multicast. Figure 2.2 illustrates the concept of the two types of interprocess communications.


Fig 2.2 Unicast versus Multicast

Modern-day operating systems such as UNIX and Windows provide facilities for interprocess communications. We will call these facilities operating system-level IPC facilities, to distinguish them from higher-level IPCs. System-level IPC facilities include message queues, semaphores, and shared memory. It is possible to develop network software using these system-level facilities directly. Examples of such programs are network device drivers and system evaluation programs.
An IPC application program interface (API), also commonly expanded to application programming interface, provides an abstraction of the details and intricacies of the system-level facilities, thereby allowing the programmer to concentrate on the application logic.

2.2 An Archetypal IPC Program Interface

Consider a basic API that provides the minimum level of abstraction to facilitate IPC. Four primitive operations are needed. They are:

§  Send: This operation is issued by a sending process for the purpose of transmitting data to a receiving process. The operation must allow the sending process to identify the receiving process and specify the data to be transmitted.

§  Receive: This operation is issued by a receiving process for the purpose of accepting data from a sending process. The operation must allow the receiving process to identify the sending process and specify a memory space that allows the data to be stored, to be subsequently accessed by the receiver.

§  Connect: For connection-oriented IPC, there must be operations that allow a logical connection to be established between the issuing process and a specified process: one process issues a request-to-connect (connect for short) operation while the other process issues an accept-connection operation.

§  Disconnect: For connection-oriented IPC, this operation allows a previously established logical connection to be deallocated at both sides of the communication.

A process involved in IPC issues these operations in some predetermined order. The issuance of each operation causes the occurrence of an event. For example, a send operation issued by a sending process results in the event wherein data is transmitted to the receiving process, while a receive operation issued by a receiving process results in data being delivered to the process.            

    Fig 2.3 Interprocess Communication in basic HTTP

Network service protocols can be implemented using primitive IPC operations. For example, in basic HTTP (Hypertext Transfer Protocol, used extensively on the World Wide Web), one process and a Web browser issues a connect operation to establish a logical connection to another process. A Web server, followed by a send operation to the Web server, transmits data representing a request. The Web server process in turn issues a send operation that transmits data requested by the Web browser process. At the end of the communication, each process issues a disconnect operation to terminate the connection. Figure 2.3 illustrates the sequence of operations.

2.3 Event Synchronization

A main difficulty with IPC is to execute multiple processes involved in a system independently, with neither process knowing what takes place in the process at the other end.

Consider basic HTTP, as described above. As you can see, the two sides involved in the protocol must issue the IPC operations in a specific order. For example the browser process must not issue the send operation until the connect operation has completed. It is also important that the Web server does not begin to transmit the requested data until the browser process is ready to receive.
Furthermore, the browser process needs to be notified when the requested data has been received so that it may subsequently process the data, including formatting and displaying the data to the browser user.

The simplest way for an IPC facility to provide for event synchronization is using blocking, which is the suspension of the execution of a process until operation issued by the process has been completed.

To illustrate the use of blocking for event synchronization, consider again basic HTTP. A browser process issues a blocking connect operation, which blocks further execution of the process until the connection has been acknowledged by the server side. Subsequently, the browser process issues a blocking receive operation, which suspends execution of the process until the operation is completed (whether successfully or not). The blocking or unblocking is performed by the operating system and is initiated by the IPC facilities, not by the programmer. The programs for the two processes are shown in Figure 2.4.

The blocking will terminate subsequently when the operation is fulfilled, at which time the process is said to be unblocked. An unblocked process transits to the ready state and will resume execution in time. In the event that the operation cannot be fulfilled, a blocked process will experience indefinite blocking and will remain in the blocked state indefinitely, unless intervening measures are taken.

Blocking operations are also referred to as synchronous operations. Alternatively, IPC operations may be asynchronous or nonblocking operations. An asynchronous operation issued by a process will not cause blocking and therefore the process is free to continue with its execution once the asynchronous operation is issued to the IPC facility. The process will subsequently be notified by the IPC facility when and if the operation is fulfilled.
A nonblocking or asynchronous operation can be issued by a process when that process may proceed without waiting for the completion of the event that the operation initiates. For example, the receive operation issued by the Web browser must be blocking, because the browser process must wait for the response from the Web server in order to proceed with further processing. On the other hand, the send operation issued by the Web server can be nonblocking, because the Web server need not wait for the completion of the send operation before proceeding with the next operation (the disconnect), so that it may proceed to service other Web browser processes.


Fig 2.4 Program flow in 2 programs involved in an IPC

  2.4 Different types of send and receive operations in an IPC

2.4.1 Synchronous Send and Synchronous Receive

Figure 2.5 is a diagram, which illustrates the event synchronization for a protocol session implemented using synchronous send and synchronous receive operations. In this scenario, a receive operation issued causes the suspension of the issuing process (process 2) until data is received to fulfill the operation. Likewise, a send operation issued causes the sending process (process 1) to suspend. When the data sent has been received byprocess2, the IPC facility on host 2 sends an acknowledgment to the IPC facility on host I, and process 1 may subsequently be unblocked.

The use of synchronous send and synchronous receive is warranted if the application logic of both processes requires that the data sent must be received before further processing can proceed.

Depending on the implementation of the IPC facility, the synchronous receive operation may not be fulfilled until the amount of data the receiver expects to receive has arrived. For example, if process 2 issues a receive for 300 bytes of data, and the send operation brings only 200 bytes, it is possible for the blocking of process 2 to continue even after the first 200 bytes have been delivered; in such a case, process 2 will not be unblocked until process 1 subsequently sends the remaining 100 bytes of data.

     Fig 2.5 Synchronous Send and Synchronous Receive

 2.4.2 Asynchronous Send and Synchronous Receive

           
Fig 2.6 Asynchronous Send and Synchronous Receive


Figure 2.6 illustrates an event diagram for a protocol session implemented using asynchronous send and synchronous receive operations. As before, a receive operation issued will cause the suspension of the issuing process until data is received to fulfill the operation. However, a send operation issued will not cause the sending process to suspend. In this case the sending process is never blocked, so no acknowledgment is necessary from the IPC facility on the host of process 2, if the sender's application logic does not depend on the receiving of the data at the other end. However, depending on the implementation of the IPC facility, there is no guarantee that the data sent will actually be delivered to the receiver. For example, if the send operation is executed before the corresponding receive operation is issued on the other side, it is possible that the data will not be delivered to the receiving process unless the IPC facility makes provisions to retain the prematurely sent data.

2.4.2 Synchronous Send and Asynchronous Receive

An asynchronous receive operation causes no blocking of the process that issues the operation, and the outcome will depend on the implementation of the IPC facility. The receive operation will, in all cases, return immediately, and there are three scenarios for what happens subsequently:

Scenario 1: The data requested by the receive operation has already arrived at the time when the receive operation is issued. In this case, the data is delivered to process 2 immediately and an acknowledgment from host 2's IPC facility will unblock process 1.

Scenario 2: The data requested by the receive operation has not yet arrived and thus no data is delivered to the process. It is the receiving process's responsibility to confirm that it has indeed received the data and repeat the receive operation until the data has arrived. Process 1 is blocked indefinitely until process 2 reissues a receive request and an acknowledgment eventually arrives from host 2's IPC facility.

Scenario 3: The data requested by the receive operation has not yet arrived. The IPC facility of host 2 will notify process 2 when the data it requested has arrived, at which point process 2 may proceed to process the data. This scenario requires that process 2 provide a listener or event handler that can be invoked by the IPC facility to notify the process of the arrival of the requested data.


  
Fig 2.7 Synchronous Send and Asynchronous Receive         


 2.4.3 Asynchronous Send and Asynchronous Receive

Without blocking on either side, the only way that the data can be delivered to the receiver is if the IPC facility retains the data received. The receiving process can then be notified of the data's arrival (see Figure 2.8). Alternatively, the receiving process may poll for the arrival of the data and process it when the awaited data has arrived.


      
Fig 2.8 Asynchronous Send and Asynchronous Receive

2.5 Timeouts and Threading

Although blocking provides the necessary synchronization for IPC, it is generally unacceptable to allow a process to be suspended indefinitely. There are two measures to address this issue. First, timeouts may be used to set a maximum time period for blocking. Timeouts are provided by the IPC facility and may be specified in a program with an operation. Second, a program may spawn a child process or a thread to issue a blocking operation, allowing the main thread or parent process of the program to proceed with other processing while the child process or child thread is suspended. Figure 2.9 illustrates this use of a thread.
Timeouts are important if the execution of a synchronous operation has the potential of resulting in indefinite blocking. For example, a blocking connect request can result in the requesting process being suspended indefinitely if the connection is unfulfilled or cannot be fulfilled as a result of a breakdown in the network connecting the two processes. In such a situation, it is typically unacceptable for the requesting process to "hang" indefinitely. Indefinite blocking can be avoided by using a timeout. For example, a timeout period of 30 seconds may be specified with the connect request. If the request is not completed within approximately 30 seconds, it will be aborted by the IPC facility, at which time the requesting process will be unblocked, allowing it to resume processing.
  

   Fig 2.9 Using a thread for a blocking operation

2.6 Deadlocks and Timeouts

Indefinite blocking may also be caused by a deadlock. In IPC, a deadlock can result from operations that were issued improperly, perhaps owing to a misunderstanding of a protocol, or owing to programming errors.

Fig 2.10 A deadlock caused by blocking operations
Figure 2.10 illustrates such a case. In process 1, a blocking receive operation is issued to receive data from process 2. Concurrently, process 2 issues a blocking receive operation where a send operation was intended. As a result, both processes are blocked awaiting data sent from the other, which can never occur (since each process is now blocked). As a result, each process will be suspended indefinitely until a timeout occurs, or until the operating systems abort the processes.

2.7 Data Representation

At the physical layer (that is, the lowest layer, as opposed to the application layer, which is the highest) of the network architecture, data is transmitted as analog signals, which represent a binary stream. At the application layer, a more complex representation of transmitted data is needed in order to support data types and data structures provided in programming languages, such as character strings, integers, floating point values, arrays, records, and objects.

Consider an integer value that needs to be sent by process 1. This value is represented in the integer representation of Host A, which is a 32-bit machine that uses the "big-endian" representation for multi-byte data types. (The terms big endian and little endian refers to which bytes are most significant in multi-byte data types. In big-endian architectures, the leftmost bytes [those with a lower address] are most significant. In little-endian architectures, the rightmost bytes are most significant.)

Host B, on the other hand, is a 16-bit machine that uses "little-endian" representation. Suppose the value is sent as a 32-bit stream directly from process l's memory storage and placed into process 2's memory location. Then 16 bits of the value sent will need to be truncated, since an integer value only occupies 16 bits on host B, and the byte order of the integer representation must be swapped in order for the value to be interpreted correctly by process 2.

Similarly, when heterogeneous hosts are involved in IPC, it is not enough to transmit data values or structures using raw bit streams unless the participating processes take measures to package and interpret the data appropriately. For our example, there are three schemes for doing so:

1. Prior to issuing the send operation, process 1 converts the value of the integer to the 16-bit, little-endian data representation of process 2.
2. Process 1 sends the data in 32-bit, big-endian representation. Upon receiving the data, process 2 converts it to its 16-bit, little-endian representation.
3. A third scheme is for the processes to exchange the data in an external representation: data will be sent using this representation and the data received will be interpreted using the external representation and converted to the native representation.

Some well known external data representation schemes are:
Ø Sun XDR
Ø ASN.1 (Abstract Syntax Notation)
Ø XML (Extensible Markup Language)

2.8 Data Encoding

Although customized programs can be written to perform IPC using any mutually agreed upon scheme of data marshaling, general-purpose distributed applications require a universal, platform-independent scheme for encoding the exchanged data. Hence there exist network data encoding standards.

At the simplest level, an encoding scheme such as External Data Representation (XDR), allows a selected set of programming data types and data structures to be specified with IPC operations. The data marshaling and unmarshaling are performed automatically by the IPC facilities on the two ends, transparent to the programmer.

At a higher level of abstraction standards such as ASN.1 (Abstract Syntax Notation Number 1) exist. ASN.1 is an Open Systems Interconnection (OSI) standard that specifies a transfer syntax for representing network data. The standard covers a wide range of data structures (such as sets and sequences) and data types (such as integer, Boolean, and characters) and supports the concept of data tagging. Each data item transmitted is encoded using syntax that specifies its type, its length, its value and optionally a tag to identify a specific way of interpreting the syntax.

At an even higher level of abstraction, the Extensible Markup Language (XML) has emerged as a data description language for data sharing among applications, primarily Internet applications, using syntax similar to the Hypertext Markup Language (HTML), which is the language used for composing Web pages. XML goes one step beyond ASN.1 in that it allows a user to use customized tags (such as the tags <message>, <to>, and <from> to specify a unit of data content. XML can be used to facilitate data interchange among heterogeneous systems, to segregate the data content of a Web page (written in XML) from the display syntax (written in HTML), and to allow the data to be shared among applications.

2.9 Text-based protocols

Data marshaling is at its simplest when the data exchanged is a stream of characters or text encoded using a representation such as ASCII. Exchanging data in text has the additional advantage that the data can be easily parsed in a program and displayed for human perusal. Hence it is a popular practice for protocols to exchange requests and responses in the form of character strings. Such protocols are said to be text-based. Many popular network protocols, including FTP (File Transfer Protocol), HTTP and SMTP (Simple Mail Transfer Protocol) are text-based.


2.10 Request-Response Protocols

An important type of protocol is the request-response protocol. In this protocol, one side issues a request and awaits a response from the other side. Subsequently, another request may be issued, which in turn elicits another response. The protocol proceeds in an iteration of request-response, until the desired task is completed. The popular network protocols FTP, HTTP, and SMTP are all request-response protocols.


2.11 Event Diagram and Sequence Diagram

An event diagram, introduced is a diagram that can be used to document the detailed sequence of events and blocking during the execution of a protocol. Figure 2.11 is an event diagram for a request-response protocol involving two concurrent processes, A and B. The execution of each process with respect to time is represented using a vertical line, with time increasing downward. A solid line interval along the execution line represents a time period during which the process is active. A broken line interval represents when the process is blocked. In the example, both processes are initially active. Process B issues a blocked receive operation in anticipation of request 1 from process A.
     

Fig 2.11 Event Diagram of a protocol

Process A meanwhile issues the awaited request 1 using a nonblocking send operation, then subsequently a blocking receive operation in anticipation of process B's response. The arrival of request 1 reactivates process B, which processes the request before issuing a send operation to transmit response 1 to process A. Process B then issues a blocking receive for request 2 from process A. The arrival of response 1 unblocks process A, which resumes execution to work on the response and to issue request 2, which unblocks Process B. A similar sequence of events follows.

Note that each round of request-response entails two pairs of send and receive operations to exchange two messages. The protocol can extend to any number of rounds of exchange using this pattern.

Figure 2.15 uses an event diagram to describe basic HTTP. In its basic form, HTTP is a text-based, request-response protocol that calls for only one round of exchange of messages. A Web server process is a process that constantly listens for incoming requests from Web browser processes. A Web browser process makes a connection to the server, and then issues a request in a format dictated by the protocol. The server processes the request and dispatches a response composed of a status line, header information, and the document requested by the browser process. Upon receiving the response, the browser process parses the response and displays the document.
An event diagram is a useful device for illustrating the synchronization events. A simplified form of diagram, known as a sequence diagram and part of the UML notations, is more commonly used to document interprocess communications: In a sequence diagram, the execution flow of each participant of a protocol is represented as a dashed line and does not differentiate between the states of blocked and executing. Each message exchanged between the two sides shown using a directed line between the two dashed lines, with a description label above the directed line, as illustrated in Figure 2.12.

Fig 2.12 A Sequence Diagram

The sequence diagram for the basic HTTP is shown in Figure 2.13.

Fig 2.13 The sequence diagram for HTTP

2.12 Connection-oriented versus Connectionless IPC

Using a connection-oriented IPC facility, two processes establish a connection (which is implemented in software-rather than physical), then exchange data by inserting data to and extracting data from the connection. Once a connection is established, there is no need to identify the sender or the receiver. Using a connectionless IPC facility, data is exchanged in independent packets, each of which needs to be addressed specifically to the receiver.

 2.13 The evolution of paradigm for interprocess communication

At the least abstract, IPC involves the transmission of a binary stream over a connection, using low-level serial or parallel data transfer. This IPC paradigm may be appropriate for network driver software, for instance.

At the next level is a well-known paradigm called the socket application program interface (the socket API). Using the socket paradigm, two processes exchange data using a logical construct called a socket, one of which is established at either end. Data to be sent is written to the socket. At the other end, a receiving process reads or extracts data from its socket.

The Remote Procedure Call (RPC) or Remote Method Invocation (RMI) paradigm provides further abstraction by allowing a process to make procedure calls or method invocations to a remote process, with data transferred between the two processes as arguments and return values.

2 comments:

Whatsapp Button works on Mobile Device only

Start typing and press Enter to search