Understanding the concepts behind asynchronous messaging

REST, gRPC, and GraphQL are all API paradigms and they are very different from each other as discussed in my article series. But they do have one thing in common - they are all examples of synchronous communication. And if you are wondering what about async/await in .NET, no, I am not talking about that!

In synchronous communication, the sender of a request has to wait for a response from the receiver before it can carry out meaningful work. The sender relies on the receiver to receive the message, process it, and return a response to carry out further actions. If the receiver is busy or experiencing an outage, then the sender might not be able to carry out its operations. HTTP, being ubiquitous, is the most common transport used for all three API patterns. Developers assuming that “The network is reliable” is in fact the first fallacy of distributed computing. So, what happens when you have an HTTP timeout? You never know what is going on at the receiver end. In fact, you can’t be sure that the request reached the receiver. Retries can be implemented and this will work if you have idempotent operations, but what if the operation is payment-related? I wouldn’t be happy if my credit card was charged multiple times as a part of retries! The sender and receiver must always be up and running and available at all points in time. This is called temporal coupling, where two parties that communicate are coupled in terms of timing. Synchronous communication patterns have a high degree of temporal coupling. Think of a phone call. Both the caller and receiver have to be available at the time of the phone call for the call to happen. 

 

Let’s get introduced to asynchronous messaging. It is an asynchronous communication pattern where the sender and receiver operate independently of each other. The sender sends a message to a “queue”, and the receiver can process it when it is available. This can be immediate or at any point in the future, the time of which cannot be predicted.  This reduces the temporal coupling as the sender and receiver don't have to be available at all points of time for processing to happen. This means that the sender sends a request but the receiver processes it at a different time or when it can! This also means that the sender can send a request but it does not need to wait for a response from the receiver or there is nothing that can qualify as a response blocking thread, and the sender and receiver can continue with other operations. It's “fire and forget”. Think of you writing a letter and dropping that in a post box. The postal department takes care of the rest for you. 

 

Asynchronous Messaging is not a new concept. It dates back to the 1960s and with the emergence of Message-Oriented-Middleware in the 1980s and 1990s, the concept started to get stronger. Microservices and Service-Oriented Architecture further accelerated the growth of Asynchronous Messaging. With IoT and real-time applications becoming more popular than ever before, asynchronous messaging continues to gain strength. 

 

Message Queue

With Asynchronous Messaging, many of the development issues like retries, fault handling, decoupling code, and introducing a single responsibility become a part of your infrastructure. And the most critical piece of your infrastructure is the message queue. A message queue can be considered a database but instead of storing and indexing your data, it focuses on delivering your messages from one component to another. The sender is only responsible for sending messages to the message queue. Delivering the message to the receiver is the responsibility of the message queue. There is only one-way communication. However, the sender does not know the receiver at all. The introduction of a message queue leads to very loose temporal coupling. The sender and receiver become two very different applications, that can be deployed and scaled separately. Even scaling horizontally is possible with this communication pattern in place. 

Message queues are a reliability feature for your systems. Your messages are safe if you want to send them from one place to another. They are secure even when the message's consumer has issues or an outage. The message will be delivered to the consumer when it is back up or ready to process the message. It accepts messages, stores them, and delivers them reliably.

 

With asynchronous messaging, we are closer to the realm of integrated applications. The sender and receiver are two different, stand-alone applications. Each application has one well-defined responsibility. Any related functionality is the responsibility of another application or receiver. But they still communicate with each other asynchronously via messaging.

 

What are the steps involved?

  1. The sender sends a message to the queue
  2. The receiver reads the message from the queue
  3. The receiver processes the message
  4. The message gets removed from the queue

This is the simplest scenario. But it involves 4 steps which might seem unnecessary given you can achieve the same in a single RPC call. But what happens when the sender or receiver is having trouble? What if the sender and receiver are fine but there are network connectivity issues? What if your message gets lost in such a network call? Numerous systems cannot afford even the tiniest loss of messages. In such cases, with asynchronous messaging in place, the messages are stored in the queue. The messages are durable and they can be retried by the message queue in place ensuring storage and delivery of your messages. 

Now, if you have an extensive system with many senders and receivers, how does that work? Is your message queue one big bucket from which messages are retrieved? No. There is typically a one-to-one relationship between message type and message queues. A sender always sends a specific message type to a specific message queue. The receiver on the other side of the queue is always interested in only that message type. It tries to get a specific message from a particular queue of messages. 

Message queues are logical concepts and are part of a more comprehensive messaging system.

 

Message

Message is a data record, the unit of communication in asynchronous messaging. You can think of it like an envelope with the contents placed inside it. 

A message has two parts -  a metadata header and a payload body. Headers are key-value pairs, much like HTTP headers, and have information about the payload,

The body of the message has the actual content that needs to be transmitted and can be in JSON, XML, plain text, or even binary format. A message usually has an identifier defined by the application that uniquely identifies the message. This ID is generally found in the header. The headers can also include CorrelationId, Timestamp, message type, routing information etc. Message queues typically enforce a limit on the message size. For example, Azure Storage Queues has a maximum message size of 64KB, whereas Azure Service Bus can handle messages up to 100MB.

Sending messages cannot communicate anything. For example, with REST APIs, the HTTP Verbs(GET, POST, PUT, DELETE) convey the intent. The intent of the message must be made clear. The two types of messages—commands and events—can help with that. 

Command

A command indicates that something needs to happen. It is an instruction and some action or processing needs to be taken when it is received. That action can be receiving the message, performing some business logic, and may be sending another message to another receiver. So it can build up a chain of actions or a workflow. A command is a high-value message. It should be processed only once. Multiple senders may send commands, but a single receiver will always process a given command.

Event

An event is yet another type of message communicating the intent that something has happened. It is immutable and talks about something that happened in the past. It is facts! An event is usually “published” by the sender. Receivers can “subscribe” to events and take action. Unlike commands, there is always a single sender for events, which can be consumed by one or many receivers. 

Both commands and events are regular messages; it is the intent they communicate that differentiates them. This difference in intent also means that the style of using them differs greatly. 

 

Communication Styles in Asynchronous Messaging

There are multiple styles of communication with asynchronous messaging.

 

Point-to-Point Channel

A point-to-point channel ensures that a given message is only processed by one receiver. This pattern is unidirectional communication, and it can be used to deliver specific messages to a single, specific receiver. Messages used in point-to-point interactions are commands.

Think of an online retail shop where you are shopping. You have a basket full of items. At the point of check-out, the shop can send a message to the Payment Service saying “Take the payment”. Let us call it the “TakePayment” message. It indicates that some action needs to happen, i.e. payment has to be taken.

 

Here, the sender (shop) knows which service(Payment Service) can handle that command. The sender does not know the internals of the Payment Service at all; it only knows that the Payment Service is interested in the TakePayment message. The sender does not know whether the Payment Service is busy, whether it is up or down or any other specifics about the Payment Service. It only knows that the TakePayment message should be sent to the Payment Service. Only the Payment Service can process the payments. There is a very specific audience in this case. 

It could be possible that your distributed system experiences a surge in traffic and, thereby, a lot of messages, causing a message buildup. In such cases, multiple receivers might be looking at the same queue. This is a common scenario and is called the Competing Consumers Pattern. Even in such a scenario, this communication style ensures that only one instance of the receiver processes a message. It does not need to be coordinated by the receivers communicating.

Request/Reply Pattern

Despite messaging being unidirectional communication, sometimes the sender needs a reply from the receiver, e.g., to get information from another component. This can be achieved using two one-way messages. The sender sends a message to the receiver. The receiver does some processing and sends back a reply to the sender. This might sound similar to a request-response pattern, but the difference is that it is asynchronous. There is no guarantee that the reply is obtained immediately from the receiver. Also, note that the receiver usually uses a different message queue to send the reply to the sender, making this pattern two one-way point-to-point communication. This ensures low temporal coupling. There are special headers that connect the reply message to a request message.

Publish-Subscribe Pattern

With the Publish-Subscribe pattern, the sender(publisher) sends a message consumed by all receivers(subscribers). With this pattern, the sender does not know the receivers at all. It does not know or is concerned about the number of receivers or who the receivers are. The sender publishes a message, and all subscribers receive the message and can process the message in different ways. It is also possible that there are no subscribers for a published event. This pattern may be used while migrating from synchronous communication patterns to messaging. With this pattern, more receivers can be added later without changing the sender. All the published messages from that point onwards will be delivered to the newly added receiver.

In the above example, the Payment Service could publish a " PaymentTaken " event after processing a “TakePayment” command. A LoyaltyPoints Service and a Shipping Service could both be subscribed to the event. The LoyaltyPoints Service could use the event to update the loyalty points information, and the shipping service could use it to initiate a shipping workflow. 

Note that publish-subscribe pattern is different from Competing Consumers. With Competing consumers only one receiver processes the message, but with publish-subscribe all subscribers receive message and can process them. 

Message Endpoints

We have been using the terms sender and receiver until now. The sender and receiver could be standalone applications with specific user functionality. So, it is necessary to encapsulate the part of the application that communicates with the message queue away from the rest of the application. That part of your application is the message endpoint. On the sender side, an endpoint can take data, convert it into a message, and send it to a queue. On the receiver side, an endpoint receives messages, examines the content, and helps with the processing. You can think of endpoints as clients in your message queue. 

 

Visualising a distributed system with asynchronous messaging

Message Queue v/s Message Broker

Message Broker is another term often used in Asynchronous Messaging. There are differences between Message Queues and Message Brokers, but the difference has narrowed considerably with the advent of PaaS products. 

A message queue facilitates communication between the sender and receiver by sending, receiving, and storing messages using the queue data structure. The messages follow the First-In-First-Out order.

Message Brokers have more features than queues. They extend message queues, but we get additional features like routing, message filtering, transformation, and communication patterns like publish-subscribe and point-to-point channels. Unlike message queues, message brokers can read the information carried through them.

 

Architectures

Asynchronous Messaging can be used to achieve several architectures. 

  • Event-driven architecture
  • Microservices architecture
  • N-tier architecture
  • Web-queue-worker architecture

Benefits of Asynchronous Messaging

Decoupling

With asynchronous messaging, the sender and receiver have low temporal coupling. The sender can send messages into the queue, and the receiver can process them when it is free to do so. There is no time-bound dependency on the sender and receiver being available simultaneously. If a message takes a long to process on the receiver end, the sender can continue working and send messages into the queue where they can be stored durably. The sender and receiver can continue to operate independently without affecting the other. They work at their maximum throughput, not wasting time waiting for each other. 

Performance

Anytime we use a synchronous communication model, there’s always a risk of an “epic fail” scenario. There’s always a risk that the system will start running out of threads and memory, the garbage collector will start suspending threads more frequently, the system will do more housekeeping than business work, and soon after, it will fail. You are always “almost there” with a synchronous communication model. With asynchronous messaging, you are not bound to the timeout problem with HTTP. Communication is generally using a protocol like AMQP.  And the throughput of asynchronous messaging systems exceeds those of synchronous communication models.

Scaling & Load Balancing

With busy systems, the number of messages in your queue can be high. This does not send receivers into a frenzy as the message queues often use the “pull model”, where receivers peek into the message queues for new messages and process them as they are read rather than messages pushed to the receivers. If, at any point, the queue begins to grow and you want faster processing, you can add receivers dynamically to spread the load. However, the message queues can ensure that each message is processed only once. A given message is only processed by a single, specific receiver. This is called the Competing Consumers Pattern. Receivers can also be removed when the sudden surge in traffic has been dealt with. This balances workload and increases the availability and responsiveness of your system.

Load Leveling

With surges in traffic, you can use the message queue as a buffer, with receivers draining the messages at their own pace rather than scaling out. This ensures that despite more incoming requests, the processing throughput remains constant. In addition to keeping your systems responsive and available, this helps control costs as you only need adequate resources to manage an average load, not peak loads.

Recoverability - Retries & Fault Tolerance

With asynchronous messaging, failures in senders or receivers do not affect each other. Messages persist, and even messages that cannot be processed due to errors remain in the system. This ensures that data is never lost. Failure in one part of your distributed system does not affect your entire system; it is isolated and can be worked upon and fixed. This makes your system resilient. 

Transient errors are caused by concurrency conflicts, network issues, etc.; they are not caused by business logic. Immediate retries can help resolve issues caused by concurrency. With network-related issues that might need more time to resolve, delayed retries can help. Different retry strategies can be used with delayed retries, like fixed interval and exponential backoff.

Persistent errors, on the other hand, could be caused by business logic. Messages that fail get moved into error queues or dead-letter queues. After the cause of the issue has been fixed, the messages in error queues can be retried and processed.

Consistency

Data consistency can be ensured by using message queues that participate in transactions or by patterns like the Outbox Pattern.

Reliability

Asynchronous messaging increases your system's reliability. The communication model removes the dependency on connectivity between sender and receiver by introducing message queues for durable message storage. If a receiver loses connectivity, it only needs to reconnect to the message queue, and it can start processing any messages. No data will be lost. Unless the message queue is down, the sender is never blocked from sending messages. Most modern message queues are highly available, reducing this risk further.

Resiliency

Asynchronous messaging adds to system resiliency. If a receiver fails while processing a message, the message is put back in the queue and can be picked up later or by another receiver. If a sender fails to send a message, the atomicity of transactions can help roll back the entire transaction.

Challenges

Complex Programming Model

The programming model with asynchronous messaging is complex. There is no single thread of execution, which means testing and debugging can be complex. It can give rise to many edge-case scenarios. The development effort is not trivial, either. The shift from synchronous communication to asynchronous messaging is a rather big one.

Message Ordering Issues

Message queues guarantee delivery but not the time of delivery. This means messages can go out of sequence. Furthermore, with retry mechanisms, messages can always be out of order. This means the system has to be developed so that message order is of no consequence. 

Steep Learning Curve

This is a personal one. I have spent the first two decades of my career focusing on synchronous communication patterns. While I understand the basic concepts well, I have only seen the tip of the iceberg here!

Tools

Some of the popular message queues/message brokers

  • Azure Service Bus (Message Broker)
  • Azure Storage Queue  (Message Queue)
  • RabbitMQ  (Message Broker)
  • Amazon SQS  (Message Queue)

You can also choose abstractions like NServiceBus where the message queues/brokers are abstracted away, and the libraries work with the queueing system.

Use Cases

  • Financial Services
  • Retail
  • Healthcare
  • Industrial Automation

When To Use?

  • Communication between internal services where each service has a set service boundary, where it has all the data that it needs to operate in isolation
  • Communication between modules in a monolith to make the modules completely decoupled
  • When a third party service does processing, messaging can be used to instruct an endpoint to issue a RPC call to the service. The processing will be carried out asynchronously, and some load leveling can be achieved for busy services.




REST : Myths & Facts

01 December 2023

This article discusses some of the misconceptions about REST, and what is not REST and presents you with the facts.