2.1 Introduction
The backbone
of distributed computing is interprocess communications (IPC): the ability for
separate, independent processes to communicate among themselves to collaborate on
a task.
Figure 2.1
illustrates basic IPC: Two independent processes, possibly running on separate
machines, exchange data over the interconnecting network. In this case, process
1 acts as the sender, which transmits data to process 2, the receiver.
Fig
2.1 Interprocess Communication
In distributed
computing, two or more processes engage in IPC in a protocol - a set of rules
that must be observed by the participants in data communication agreed upon by
the processes. A process may be a sender at some points during a protocol, a
receiver at other points.
When
communication is from one process to a single other process, the IPC is said to
be a unicast. When communication is from one process to a group of processes,
the IPC is said to be a multicast. Figure 2.2 illustrates the concept of the
two types of interprocess communications.
Fig
2.2 Unicast versus Multicast
Modern-day
operating systems such as UNIX and Windows provide facilities for interprocess
communications. We will call these facilities operating system-level IPC
facilities, to distinguish them from higher-level IPCs. System-level IPC
facilities include message queues, semaphores, and shared memory. It is
possible to develop network software using these system-level facilities
directly. Examples of such programs are network device drivers and system
evaluation programs.
An IPC
application program interface (API), also commonly expanded to application
programming interface, provides an abstraction of the details and intricacies
of the system-level facilities, thereby allowing the programmer to concentrate
on the application logic.
2.2 An Archetypal IPC Program Interface
Consider a
basic API that provides the minimum level of abstraction to facilitate IPC.
Four primitive operations are needed. They are:
§
Send: This
operation is issued by a sending process for the purpose of transmitting data
to a receiving process. The operation must allow the sending process to
identify the receiving process and specify the data to be transmitted.
§
Receive: This
operation is issued by a receiving process for the purpose of accepting data
from a sending process. The operation must allow the receiving process to
identify the sending process and specify a memory space that allows the data to
be stored, to be subsequently accessed by the receiver.
§
Connect: For
connection-oriented IPC, there must be operations that allow a logical
connection to be established between the issuing process and a specified
process: one process issues a request-to-connect (connect for short) operation
while the other process issues an accept-connection operation.
§
Disconnect: For
connection-oriented IPC, this operation allows a previously established logical
connection to be deallocated at both sides of the communication.
A
process involved in IPC issues these operations in some predetermined order.
The issuance of each operation causes the occurrence of an event. For example,
a send operation issued by a sending process results in the event wherein data
is transmitted to the receiving process, while a receive operation issued by a
receiving process results in data being delivered to the process.
Fig 2.3 Interprocess Communication in basic
HTTP
Network
service protocols can be implemented using primitive IPC operations. For
example, in basic HTTP (Hypertext Transfer Protocol, used extensively on the
World Wide Web), one process and a Web browser issues a connect operation to
establish a logical connection to another process. A Web server, followed by a
send operation to the Web server, transmits data representing a request. The
Web server process in turn issues a send operation that transmits data
requested by the Web browser process. At the end of the communication, each
process issues a disconnect operation to terminate the connection. Figure 2.3
illustrates the sequence of operations.
2.3 Event Synchronization
A
main difficulty with IPC is to execute multiple processes involved in a system
independently, with neither process knowing what takes place in the process at
the other end.
Consider
basic HTTP, as described above. As you can see, the two sides involved in the
protocol must issue the IPC operations in a specific order. For example the
browser process must not issue the send operation until the connect operation
has completed. It is also important that the Web server does not begin to
transmit the requested data until the browser process is ready to receive.
Furthermore,
the browser process needs to be notified when the requested data has been
received so that it may subsequently process the data, including formatting and
displaying the data to the browser user.
The
simplest way for an IPC facility to provide for event synchronization is using blocking, which is the suspension of
the execution of a process until operation issued by the process has been
completed.
To
illustrate the use of blocking for event synchronization, consider again basic
HTTP. A browser process issues a blocking connect operation, which blocks
further execution of the process until the connection has been acknowledged by
the server side. Subsequently, the browser process issues a blocking receive
operation, which suspends execution of the process until the operation is
completed (whether successfully or not). The blocking or unblocking is
performed by the operating system and is initiated by the IPC facilities, not
by the programmer. The programs for the two processes are shown in Figure 2.4.
The
blocking will terminate subsequently when the operation is fulfilled, at which
time the process is said to be unblocked. An unblocked process transits to the
ready state and will resume execution in time. In the event that the operation
cannot be fulfilled, a blocked process will experience indefinite blocking and
will remain in the blocked state indefinitely, unless intervening measures are
taken.
Blocking
operations are also referred to as synchronous
operations. Alternatively, IPC operations may be asynchronous or nonblocking
operations. An asynchronous operation issued by a process will not cause
blocking and therefore the process is free to continue with its execution once
the asynchronous operation is issued to the IPC facility. The process will
subsequently be notified by the IPC facility when and if the operation is
fulfilled.
A
nonblocking or asynchronous operation can be issued by a process when that process
may proceed without waiting for the completion of the event that the operation
initiates. For example, the receive operation issued by the Web browser must be
blocking, because the browser process must wait for the response from the Web
server in order to proceed with further processing. On the other hand, the send
operation issued by the Web server can be nonblocking, because the Web server
need not wait for the completion of the send operation before proceeding with
the next operation (the disconnect), so that it may proceed to service other
Web browser processes.
Fig 2.4 Program flow in 2 programs involved in an IPC
2.4 Different types of send and receive operations in an IPC
2.4.1 Synchronous Send and Synchronous Receive
Figure
2.5 is a diagram, which illustrates the event synchronization for a protocol
session implemented using synchronous send and synchronous receive operations.
In this scenario, a receive operation issued causes the suspension of the
issuing process (process 2) until data is received to fulfill the operation.
Likewise, a send operation issued causes the sending process (process 1) to
suspend. When the data sent has been received byprocess2, the IPC facility on
host 2 sends an acknowledgment to the IPC facility on host I, and process 1 may
subsequently be unblocked.
The
use of synchronous send and synchronous receive is warranted if the application
logic of both processes requires that the data sent must be received before
further processing can proceed.
Depending
on the implementation of the IPC facility, the synchronous receive operation
may not be fulfilled until the amount of data the receiver expects to receive
has arrived. For example, if process 2 issues a receive for 300 bytes of data,
and the send operation brings only 200 bytes, it is possible for the blocking
of process 2 to continue even after the first 200 bytes have been delivered; in
such a case, process 2 will not be unblocked until process 1 subsequently sends
the remaining 100 bytes of data.
Fig 2.5 Synchronous Send and Synchronous Receive
2.4.2 Asynchronous Send and Synchronous Receive
Fig 2.6 Asynchronous Send and Synchronous Receive
Figure
2.6 illustrates an event diagram for a protocol session implemented using
asynchronous send and synchronous receive operations. As before, a receive
operation issued will cause the suspension of the issuing process until data is
received to fulfill the operation. However, a send operation issued will not
cause the sending process to suspend. In this case the sending process is never
blocked, so no acknowledgment is necessary from the IPC facility on the host of
process 2, if the sender's application logic does not depend on the receiving
of the data at the other end. However, depending on the implementation of the
IPC facility, there is no guarantee that the data sent will actually be
delivered to the receiver. For example, if the send operation is executed
before the corresponding receive operation is issued on the other side, it is
possible that the data will not be delivered to the receiving process unless
the IPC facility makes provisions to retain the prematurely sent data.
2.4.2 Synchronous Send and Asynchronous Receive
An
asynchronous receive operation causes no blocking of the process that issues
the operation, and the outcome will depend on the implementation of the IPC
facility. The receive operation will, in all cases, return immediately, and
there are three scenarios for what happens subsequently:
Scenario 1: The data requested by the
receive operation has already arrived at the time when the receive operation is
issued. In this case, the data is delivered to process 2 immediately and an
acknowledgment from host 2's IPC facility will unblock process 1.
Scenario 2: The data requested by the
receive operation has not yet arrived and thus no data is delivered to the
process. It is the receiving process's responsibility to confirm that it has
indeed received the data and repeat the receive operation until the data has
arrived. Process 1 is blocked indefinitely until process 2 reissues a receive
request and an acknowledgment eventually arrives from host 2's IPC facility.
Scenario 3: The data requested by the
receive operation has not yet arrived. The IPC facility of host 2 will notify
process 2 when the data it requested has arrived, at which point process 2 may
proceed to process the data. This scenario requires that process 2 provide a
listener or event handler that can be invoked by the IPC facility to notify the
process of the arrival of the requested data.
Fig 2.7 Synchronous Send and Asynchronous Receive
2.4.3 Asynchronous Send and Asynchronous Receive
Without
blocking on either side, the only way that the data can be delivered to the
receiver is if the IPC facility retains the data received. The receiving
process can then be notified of the data's arrival (see Figure 2.8).
Alternatively, the receiving process may poll for the arrival of the data and
process it when the awaited data has arrived.
Fig 2.8 Asynchronous Send and Asynchronous Receive
2.5 Timeouts and Threading
Although
blocking provides the necessary synchronization for IPC, it is generally
unacceptable to allow a process to be suspended indefinitely. There are two
measures to address this issue. First, timeouts may be used to set a maximum
time period for blocking. Timeouts are provided by the IPC facility and may be
specified in a program with an operation. Second, a program may spawn a child
process or a thread to issue a blocking operation, allowing the main thread or
parent process of the program to proceed with other processing while the child
process or child thread is suspended. Figure 2.9 illustrates this use of a
thread.
Timeouts are
important if the execution of a synchronous operation has the potential of
resulting in indefinite blocking. For example, a blocking connect request can
result in the requesting process being suspended indefinitely if the connection
is unfulfilled or cannot be fulfilled as a result of a breakdown in the network
connecting the two processes. In such a situation, it is typically unacceptable
for the requesting process to "hang" indefinitely. Indefinite
blocking can be avoided by using a timeout. For example, a timeout period of 30
seconds may be specified with the connect request. If the request is not
completed within approximately 30 seconds, it will be aborted by the IPC
facility, at which time the requesting process will be unblocked, allowing it
to resume processing.
Fig 2.9 Using a
thread for a blocking operation
2.6 Deadlocks and Timeouts
Indefinite
blocking may also be caused by a deadlock. In IPC, a deadlock can result from
operations that were issued improperly, perhaps owing to a misunderstanding of
a protocol, or owing to programming errors.
Fig 2.10 A deadlock caused by blocking operations
Figure
2.10 illustrates such a case. In process 1, a blocking receive operation is
issued to receive data from process 2. Concurrently, process 2 issues a
blocking receive operation where a send operation was intended. As a result,
both processes are blocked awaiting data sent from the other, which can never
occur (since each process is now blocked). As a result, each process will be
suspended indefinitely until a timeout occurs, or until the operating systems
abort the processes.
2.7 Data Representation
At
the physical layer (that is, the lowest layer, as opposed to the application
layer, which is the highest) of the network architecture, data is transmitted
as analog signals, which represent a binary stream. At the application layer, a
more complex representation of transmitted data is needed in order to support
data types and data structures provided in programming languages, such as
character strings, integers, floating point values, arrays, records, and
objects.
Consider
an integer value that needs to be sent by process 1. This value is represented
in the integer representation of Host A, which is a 32-bit machine that uses
the "big-endian" representation for multi-byte data types. (The terms
big endian and little endian refers to which bytes are most significant in multi-byte
data types. In big-endian architectures, the leftmost bytes [those with a lower
address] are most significant. In little-endian architectures, the rightmost
bytes are most significant.)
Host
B, on the other hand, is a 16-bit machine that uses "little-endian"
representation. Suppose the value is sent as a 32-bit stream directly from
process l's memory storage and placed into process 2's memory location. Then 16
bits of the value sent will need to be truncated, since an integer value only
occupies 16 bits on host B, and the byte order of the integer representation
must be swapped in order for the value to be interpreted correctly by process
2.
Similarly,
when heterogeneous hosts are involved in IPC, it is not enough to transmit data
values or structures using raw bit streams unless the participating processes
take measures to package and interpret the data appropriately. For our example,
there are three schemes for doing so:
1. Prior to
issuing the send operation, process 1 converts the value of the integer to the
16-bit, little-endian data representation of process 2.
2. Process 1
sends the data in 32-bit, big-endian representation. Upon receiving the data,
process 2 converts it to its 16-bit, little-endian representation.
3.
A third scheme is for the processes to exchange the data in an external representation: data will be
sent using this representation and the data received will be interpreted using
the external representation and converted to the native representation.
Some well known
external data representation schemes are:
Ø
Sun XDR
Ø
ASN.1 (Abstract Syntax Notation)
Ø
XML (Extensible Markup Language)
2.8 Data Encoding
Although
customized programs can be written to perform IPC using any mutually agreed
upon scheme of data marshaling, general-purpose distributed applications
require a universal, platform-independent scheme for encoding the exchanged
data. Hence there exist network data encoding standards.
At
the simplest level, an encoding scheme such as External Data Representation
(XDR), allows a selected set of programming data types and data structures to
be specified with IPC operations. The data marshaling and unmarshaling are
performed automatically by the IPC facilities on the two ends, transparent to
the programmer.
At
a higher level of abstraction standards such as ASN.1 (Abstract Syntax Notation
Number 1) exist. ASN.1 is an Open Systems Interconnection (OSI) standard that
specifies a transfer syntax for representing network data. The standard covers
a wide range of data structures (such as sets and sequences) and data types
(such as integer, Boolean, and characters) and supports the concept of data
tagging. Each data item transmitted is encoded using syntax that specifies its
type, its length, its value and optionally a tag to identify a specific way of
interpreting the syntax.
At
an even higher level of abstraction, the Extensible Markup Language (XML) has
emerged as a data description language for data sharing among applications,
primarily Internet applications, using syntax similar to the Hypertext Markup
Language (HTML), which is the language used for composing Web pages. XML goes
one step beyond ASN.1 in that it allows a user to use customized tags (such as
the tags <message>, <to>, and <from> to specify a unit of data
content. XML can be used to facilitate data interchange among heterogeneous
systems, to segregate the data content of a Web page (written in XML) from the
display syntax (written in HTML), and to allow the data to be shared among
applications.
2.9 Text-based protocols
Data
marshaling is at its simplest when the data exchanged is a stream of characters
or text encoded using a representation such as ASCII. Exchanging data in text
has the additional advantage that the data can be easily parsed in a program
and displayed for human perusal. Hence it is a popular practice for protocols
to exchange requests and responses in the form of character strings. Such
protocols are said to be text-based. Many popular network protocols, including
FTP (File Transfer Protocol), HTTP and SMTP (Simple Mail Transfer Protocol) are
text-based.
2.10 Request-Response Protocols
An
important type of protocol is the request-response protocol. In this protocol,
one side issues a request and awaits a response from the other side.
Subsequently, another request may be issued, which in turn elicits another
response. The protocol proceeds in an iteration of request-response, until the
desired task is completed. The popular network protocols FTP, HTTP, and SMTP
are all request-response protocols.
2.11 Event Diagram and Sequence Diagram
An
event diagram, introduced is a diagram that can be used to document the
detailed sequence of events and blocking during the execution of a protocol.
Figure 2.11 is an event diagram for a request-response protocol involving two
concurrent processes, A and B. The execution of each process with respect to
time is represented using a vertical line, with time increasing downward. A
solid line interval along the execution line represents a time period during
which the process is active. A broken line interval represents when the process
is blocked. In the example, both processes are initially active. Process B
issues a blocked receive operation in anticipation of request 1 from process A.
Fig 2.11 Event Diagram of a protocol
Process
A meanwhile issues the awaited request 1 using a nonblocking send operation,
then subsequently a blocking receive operation in anticipation of process B's
response. The arrival of request 1 reactivates process B, which processes the
request before issuing a send operation to transmit response 1 to process A.
Process B then issues a blocking receive for request 2 from process A. The
arrival of response 1 unblocks process A, which resumes execution to work on
the response and to issue request 2, which unblocks Process B. A similar
sequence of events follows.
Note
that each round of request-response entails two pairs of send and receive
operations to exchange two messages. The protocol can extend to any number of
rounds of exchange using this pattern.
Figure
2.15 uses an event diagram to describe basic HTTP. In its basic form, HTTP is a
text-based, request-response protocol that calls for only one round of exchange
of messages. A Web server process is a process that constantly listens for
incoming requests from Web browser processes. A Web browser process makes a
connection to the server, and then issues a request in a format dictated by the
protocol. The server processes the request and dispatches a response composed
of a status line, header information, and the document requested by the browser
process. Upon receiving the response, the browser process parses the response
and displays the document.
An
event diagram is a useful device for illustrating the synchronization events. A
simplified form of diagram, known as a sequence diagram and part of the UML
notations, is more commonly used to document interprocess communications: In a
sequence diagram, the execution flow of each participant of a protocol is
represented as a dashed line and does not differentiate between the states of
blocked and executing. Each message exchanged between the two sides shown using
a directed line between the two dashed lines, with a description label above
the directed line, as illustrated in Figure 2.12.
Fig 2.12 A Sequence Diagram
The
sequence diagram for the basic HTTP is shown in Figure 2.13.
Fig 2.13 The sequence diagram for HTTP
2.12 Connection-oriented versus Connectionless IPC
Using
a connection-oriented IPC facility, two processes establish a connection (which
is implemented in software-rather than physical), then exchange data by
inserting data to and extracting data from the connection. Once a connection is
established, there is no need to identify the sender or the receiver. Using a
connectionless IPC facility, data is exchanged in independent packets, each of
which needs to be addressed specifically to the receiver.
2.13 The evolution of paradigm
for interprocess communication
At
the least abstract, IPC involves the transmission of a binary stream over a
connection, using low-level serial or parallel data transfer. This IPC paradigm
may be appropriate for network driver software, for instance.
At
the next level is a well-known paradigm called the socket application program interface (the socket API). Using the socket
paradigm, two processes exchange data using a logical construct called a
socket, one of which is established at either end. Data to be sent is written
to the socket. At the other end, a receiving process reads or extracts data
from its socket.
The
Remote Procedure Call (RPC) or Remote Method Invocation (RMI) paradigm
provides further abstraction by allowing a process to make procedure calls or
method invocations to a remote process, with data transferred between the two
processes as arguments and return values.
Nice bro.thank you
ReplyDeleteGajab bhai keep up
ReplyDeleteAnd thank you