Getting started with Amazon SQS in .NET

A message queue is a highly available infrastructure that can be hosted on-prem, or on cloud. Consider an application A talking to an application B. It is common to use an HTTP API for application A to request information or even send some data to application B for processing. But what happens when application B is down? Application A cannot get any information it needs for processing, neither can it send any data to application B for processing. Any data that is in flight to application B is lost, which makes things worse because that information is lost forever unless you develop retry mechanisms.

Consider the same applications now done a little bit differently. We have the same application A and application B but instead of them communicating to each other, let us have application send a message, a payload, to our message queue. Application A can move to sending another message into the queue, continue processing the data it is supposed to process. The messages are durably stored in the message queue till they are consumed by application B. We now have application A and B that are independent of each other and loosely coupled. With the message queue acting as temporary storage of data till the messages are processed, we have made the system more reliable. Even if application B is down, it only needs to reconnect to the message queue and it can start processing messages again. You can read more about messaging and the core concepts in one of my previous blogs.

Amazon SQS - Understanding the basics

Amazon SQS uses redundant infrastructure. The messages are stored on multiple servers. The queue in SQS is therefore a logical container for messages. AWS handles with storing and managing the messages across their distributed infrastructure. As developers, we interact with our queue as if it is a single, logical queue while the physical location of the queue or the servers storing the messages remains transparent to you.

Queue Types in SQS

Amazon SQS supports two types of queues - standard and FIFO queues.

Standard Queues support a very high, nearly unlimited throughput, making them useful for use cases requiring processing of large volumes of messages. The order is not guaranteed; it is best-effort ordering. Messages could arrive out of order despite the attempts of SQS to deliver them in order when there is failure recovery or high throughput conditions. Standard queues offer at-least-once delivery, which means messages can be delivered more than once when there are delays or retries. Therefore the applications must be designed to be idempotent. Standard queues are highly durable, which is ensured by storing multiple copies of each message across multiple AWS availability zones. Standard queues must be used when throughput is crucial.

FIFO queues ensure ordering and exactly-once processing. The messages arrive in the order they are sent, and each message is delivered exactly once. FIFO queues have limited number of transactions per second, this can be raised to 30000 by enabling high throughput mode. FIFO queues are ideal for scenarios where the order of processing the message has to be strict, and duplicates cannot be tolerated. High throughput mode must only be used when you have the requirement coupled with strict message order.

FIFO queues increase the complexity of the systems with their strict message ordering, prevention of duplicate messages and handling of message grouping. So consider using them in the scenarios where these features are a must-have for your system.

Let us jump into code. We will come back and talk more about queue features later on. It's Christmas time, so I am going to automate a part of Santa’s workshop. I have set up two queues for this purpose - toyrequests and createdtoys. It is important that Santa's workshop can handle requests for toys from all the kids around the world. The order of request handling is less important as Santa will be delivering them all on Christmas. It is more important that high volume of requests are handled here so we will use SQS standard queues.

Here is the workflow

Requests for toys are send to the “toyrequests” queue
Elves creates the toys, adds an entry to the database and then send another message about the created toy to the “createdtoys” queue
The processed message is deleted from the “toyrequests” queue

I have two .NET Generic Host-based apps that run a background service. The app SantasWorskhop.RequestToys sends the request for toys to the toyrequests queue. The SantasWorkshop.CreateToys app receives the toy requests from the queue, processes the queue and then sends a message about the finished toy to the createdtoys request.

In my example code today I am using the .NET SDK for Amazon SQS. You can also use libraries like NServiceBus using Amazon SQS as a transport.

The AmazonSQSClient class in the SDK helps interact with the queue. It requires a secret access key, an access key id and the queue url to talk to the queues. I am storing them as dotnet user secrets for local development purposes. Authenticating the SDK to work with SQS is a big topic in itself, worthy of an article or two, so I am not going into the details here. But there are plenty of resources out there, which I will link to in the resources section.

First let us look at our SantasWorskhop.RequestToys app. The Program.cs file is fairly simple and straightforward as shown below.

HostApplicationBuilder builder = Host.CreateApplicationBuilder(args);

builder.Configuration.AddUserSecrets<Program>().Build();

builder.Services.AddHostedService<RequestToyService>();

IHost host = builder.Build();
await host.RunAsync();

The RequestToyService is the background service which does the heavy lifting. I am using the IConfiguration to get the access key id and secret stored in the user secrets. I use the secrets to create an instance of the AmazonSQSClient . Inside the loop, I get the child name and the toy from the values entered and use the SendMessageAsync to send the RequestToyMessage message to the queue.

The RequestToyMessage is a very simple C# record.

public record RequestToyMessage(string childName, string toyName);

Not so difficult, right? If I run the app and start polling for messages in the AWS Console, I can see the messages.

Let us now focus on the SantasWorkshop.CreateToys app that consumes the messages. The Program.cs is again very simple with the CreateToyService doing all the heavy lifting. In addition to the access key, secret and the queue URL for the toyrequests queue, we also need the queue URL for the createdtoys queue to which the messages about the created toys would be sent. There could be another similar app that processes those messages and sends instructions to load them to Santa’s sleigh.

HostApplicationBuilder builder = Host.CreateApplicationBuilder(args);

builder.Configuration.AddUserSecrets<Program>().Build();

builder.Services.AddHostedService<CreateToyService>();

IHost host = builder.Build();
await host.RunAsync();

public class CreateToyService(IConfiguration configuration) : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        var secret = configuration.GetValue<string>("AWS_SECRET_ACCESS_KEY");
        var accessKey = configuration.GetValue<string>("AWS_ACCESS_KEY_ID");
        var toyRequestQueueUrl = configuration.GetValue<string>("AWS_QUEUE_URL");
        var createdToysQueueUrl = configuration.GetValue<string>("AWS_FINISHTOYS_QUEUE_URL");
        var regionId = configuration.GetValue<string>("AWS_REGION");
        
        var sqsClient = new AmazonSQSClient(accessKey, secret, RegionEndpoint.GetBySystemName(regionId));

        while (!stoppingToken.IsCancellationRequested)
        {
            //recieve messages from the toyrequests queue
            var request = new ReceiveMessageRequest()
            {
                QueueUrl = toyRequestQueueUrl,
            };
            var response = await sqsClient.ReceiveMessageAsync(request,  stoppingToken);
            
            foreach(var message in response.Messages)
            {
                var toyRequest = JsonSerializer.Deserialize<RequestToyMessage>(message.Body);
                
                //idempotent elf logic to create the present, store it in the database and retrieve a present Id
                //for now we will simply create a new guid
                var presentId = Guid.NewGuid();
                
                Console.WriteLine($"{toyRequest.toyName} processed for child {toyRequest.childName}");

                //send the finishedtoy message to the createdtoys queue
                var finishedToyRequest = new FinishToyRequest(presentId);
                await sqsClient.SendMessageAsync(new SendMessageRequest()
                {
                    MessageBody = JsonSerializer.Serialize(finishedToyRequest),
                    QueueUrl = createdToysQueueUrl,
                });

                //explicitly delete the processed toyrequest message
                await sqsClient.DeleteMessageAsync(new DeleteMessageRequest()
                {
                    QueueUrl = toyRequestQueueUrl,
                    ReceiptHandle = message.ReceiptHandle
                });

            }
           
        }
    }
    
}

Run both apps now, and we can see the toy requests being processed and the FinishToyRequest message sent to the next queue.

The FinishToyRequest message is, again, a very simple POCO.

public record FinishToyRequest(Guid PresentId);

In my code example above, I create a new guid every time a RequestToyMessage is processed. This is not idempotent. With the at-least-once delivery mode of SQS standard queues, you need to ensure that the logic is idempotent. Otherwise, the elves could end up making a lot of extra toys for each such message, resulting in a lot of extra toys and overworked elves. In the case of a distributed system like this, idempotency is important so that retries do not have unintended side-effects and the system remains consistent.

As you can also see, I am explicitly deleting a message after processing it. This requires the queue url from which the message should be deleted and also a ReceiptHandle which is a SQS concept. Every receipt of a message from the queue also gives you a ReceiptHandle. It is an identifier associated with the receive action rather than the message itself. With standard queues, with at least one delivery, there is the possibility of each message being received more than once. With each such receive action, the handle varies. To delete messages in such instances, you must use the most recent receipt handle.

So, that is a simple workflow automated with SQS. Now, let us look at some more concepts around SQS.

Message lifecycle in Amazon SQS

A component(producer) sends a message to the queue.
A component B(consumer) retrieves the message from the queue. This starts the visibility timeout period. While the message is being processed, it remains in the queue but hidden temporarily for the duration of the visibility timeout period so that other components cannot receive it for processing.
Once the message has been processed, the consumer deletes the message so that the message is not available to any other component for processing after the visibility timeout period.

Picture Reference : What is Amazon Simple Queue Service - Amazon Simple Queue Service

When a message is sent to the queue, SQS redundantly stores the message in multiple availability zones before acknowledging it. This ensures that messages are never inaccessible. When receiving messages, you cannot specify individual messages to be retrieved, but you can specify a maximum number of messages to receive in a single receive action. The maximum number of messages that can be received in a single action is 10. To remove a message from a queue, an explicit delete request should be issued. Therefore a message must always be received before deleting, you cannot recall a message once it has been sent.

With the messages being distributed, receive actions could sometimes bring back empty responses. Rerunning the request might return a response, but this can introduce unnecessary costs. This is where long polling can help.

Long Polling v/s Short Polling

With short polling, SQS samples a subset of its servers(based on a weighted distribution) and returns messages from only those servers. So a request might not return all of your messages but a subsequent receive message request might. This works well in a scenario where you have few messages. When you keep consuming messages, SQS eventually samples all the servers, and you receive all the messages. Setting the Receive message wait time enables short polling, which is the default value.

Setting the Receive message wait time to a non-zero value enables long polling. The maximum value possible is 20s. With long polling, SQS samples all of its servers and waits to send a response till a message becomes available or the polling times out. This ensures that additional costs incurred due to polling can be eliminated, as SQS will not send an empty response (when there are no messages) or a false, empty response (when messages are available but not included, like in the short polling scenario).

While choosing between the two polling options, the cost efficiency and the responsiveness of your system must be taken into account. The Receive message wait time also requires some adjusting with it not set to too low to avoid the rare scenario of empty responses.

Queue Configuration

There are other queue parameters that can be configured with SQS.

Visibility Timeout

The length of time for which a message is received by a consumer(in-flight messages) remains hidden from other consumers. This ensures that consumers cannot process a message that is being worked on. The visibility timeout starts as soon as a message is delivered to the consumer. It is expected that the message is processed and the message is deleted from the queue during this period. If the consumer does not request to delete it before the visibility timeout is over, the message will go back into the queue and can be retrieved by another consumer. The default value of this setting is 30s. The setting specified in this screen affects the entire queue, but it is possible to adjust the visibility timeout on a per-message/multiple-message basis using the SDK. It is important to set an appropriate value for this setting based on the expected processing time so that messages reappear in the queue and retries of the message is not delayed.

Message Retention Period

The amount of time for which SQS retains messages that remain in the queue. By default it is 4 days.

Delivery Delay

The amount of time for which SQS will delay the delivery of a message when it is added to the queue. When consumers needs additional time to process messages, delay queues can help postpone the delivery of new messages to consumers. By default the value is 0. Delay queues and visibility timeout are similar concepts. While visibility timeout ensures a message is hidden after it is consumed from the queue, delay queues hides a message when it is first added to the queue. The setting on the screen is applicable to the entire queue although it is possible to control it at a message level using the SDK

Maximum message size

Maximum size of the message for the queue. The maximum allowed is 256KB. For messages more than 256KB, consider managing them using S3 buckets

Receive message wait time

This setting controls the short polling and long polling. Setting this to a non-zero value enables long polling. By definition, this is the amount of time SQS will wait for messages to become available after the queue gets a receive message request.

Resources

Setting up Amazon SQS - Amazon Simple Queue Service

Amazon SQS - Understanding the basics

Queue Types in SQS

Message lifecycle in Amazon SQS

Long Polling v/s Short Polling

Queue Configuration

Visibility Timeout

Message Retention Period

Delivery Delay

Maximum message size

Receive message wait time

Podcast and blog about my journey in tech

Understanding the concepts behind asynchronous messaging

REST : Myths & Facts

Getting started with Amazon SQS in .NET

Amazon SQS - Understanding the basics

Queue Types in SQS

Message lifecycle in Amazon SQS

Long Polling v/s Short Polling

Queue Configuration

Visibility Timeout

Message Retention Period

Delivery Delay

Maximum message size

Receive message wait time

Share

Podcast and blog about my journey in tech

Understanding the concepts behind asynchronous messaging

REST : Myths & Facts