Joe Dyndale
Joe Dyndale

Reputation: 1071

How to fix being limited to 1 client reading from Azure ServiceBus

I have something that's been quite the head scratcher for me...

I have a small service that reads messages from an Azure ServiceBus queue and stores the data in a CosmosDB collection.

The problem is that I can't get my service to scale. I have been able to optimize things to improve the number of messages read per second for one instance of the service. However, adding more instances of the service slightly degrades the number of messages read per second in total!

It's important to note that sending messages to the queue in batches works like a charm, I can send 1000-2000 messages per second to the queue without any issues. Reading from the queue is the issue.

My handler is slightly CPU intensive, and the messages range from approximately 2 KB to 900 KB in size, the average being somewhere around 25 KB. I've gotten one instance to handle approximately 41.5 messages per second now.

If I add a second instance of the service (which is an Azure Web App by the way) the total number of messages read per second for all instances drops to approximately 40. Adding yet another instance decreases it to closer to 38.

The actual code that reads messages from the queue (and handles retries, deadlettering etc.) is part of an internal company framework, which a lot of other services use, none of which have this issue. Other services have the expected behavior that performance scales linearly with the number of service instances (up to the max that ServiceBus can handle, obviously).

I have the same problem on two different Azure subscriptions (TEST and PROD) which both use the Premium ServiceBus tier.

I am not using sessions on the queue.

Has anyone here ever had a similar issue, and how did you solve it?

Things I've tried:

The only shared resources between instances of my web app are ServiceBus and CosmosDb, and as noted above, I've ruled out CosmosDb. However, seeing as I'm having the same issue in both our TEST and PROD subscriptions (our DEV subscription doesn't allow scaling out), and I've tried recreating the queue a few times in various different ways, it can't be the queue itself either, and none of the other queues in use on the same ServiceBus instance are having this issue.

Tweaking/Optimizing code has, as expected, only had impact on the performance of one instance. The possible, as far as I can tell, external bottlenecks have been ruled out. The one remaining thing, our internal framework which handles the actual reading of the messages from the queue, has also been ruled out by the fact that the exact same version of the framework is used in many other web apps where scaling out has been demonstrated to work.

I feel pretty check-mated out here...

SOLUTION: Forgot to update this question, so at last here it is... We eventually managed to set aside time to focus completely on this problem, and through various testing we concluded that it was a combination of using the ReadBatchAsync method in the SDK and having rather large messages that was the cause of this issue. Switching to using OnMessageAsync fixed it.

Upvotes: 3

Views: 824

Answers (2)

Ilya Chernomordik
Ilya Chernomordik

Reputation: 30335

I would suggest to first eliminate the possibility that it is the handling code that is the problem. Try running with a dummy StartProcessMessage that does nothing to ensure that it is not the problem/bottleneck i.e. too many writers write to some shared resource or something similar.

Another option you can try is using the latest .Net library Microsoft.Azure.ServiceBus. The classes that are available there allows for running a built in loop that allows MaxConcurrentCalls in a more natural way and easy way. But ensuring it's not the handler is the first thing you should try. If you already did maybe you should share it.

Upvotes: 1

Nkosi
Nkosi

Reputation: 247451

It is usually not a good idea to have async void operations.

Additionally, you could also refactor the processing to be invoked in batches as well.

First approach assumes an inability to make StartProcessMessage async

void StartProcessMessage(Message m) {
    //...
}

public async Task Start() {
    while (true) {
        var messages = (await _queueClient.ReceiveBatchAsync(Math.Max(1, _configuration.MaxConcurrentCalls - _messagesInProgress))).ToArray();
        Interlocked.Add(ref _messagesInProgress, messages.Length);
        var tasks = messages.Select(m => Task.Run(() => StartProcessMessage(m)));
        await Task.WhenAll(tasks); //process in parallel.
        while (_messagesInProgress > _configuration.MaxConcurrentCalls) {
            await Task.Delay(100);
        }
    }
}

The second approach assumes that StartProcessMessage can be refactored to be async

Task StartProcessMessage(Message m) {
    //...
}

public async Task Start() {
    while (true) {
        var messages = (await _queueClient.ReceiveBatchAsync(Math.Max(1, _configuration.MaxConcurrentCalls - _messagesInProgress))).ToArray();
        Interlocked.Add(ref _messagesInProgress, messages.Length);
        var tasks = messages.Select(m => StartProcessMessage(m));
        await Task.WhenAll(tasks); //process in parallel.
        while (_messagesInProgress > _configuration.MaxConcurrentCalls) {
            await Task.Delay(100);
        }
    }
}

Upvotes: 1

Related Questions