RabbitMQ cluster node failure with spring boot application

Question

I have a spring boot application that is connected to a RabbitMQ cluster (as a service in cloud foundry). When the main node in the cluster fails and for some reason the node does not come up but the application (Message Consumer) was trying to connect to the failed node and does not try to connect to other available nodes. Could someone suggest some spring configurations to fix this issue ?

17:36:23.829: [APP/PROC/WEB.0] Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method(reply-code=404, reply-text=NOT_FOUND - home node 'rabbit@rad33f2b1-mq-1.node.dc1.svvc' of durable queue 'FAILED_ORDER' in vhost '/' is down or inaccessible, class-id=50, method-id=10)

'rabbit@rad33f2b1-mq-1.node.dc1.svvc' is the failed node.

In order to continuously try connecting to the nodes on failure, i have the following spring configuration. spring.rabbitmq.listener.simple.missing-queues-fatal=false

@Configuration
public class MessageConfiguration {

public static final String FAILED_ORDER_QUEUE_NAME = "FAILED_ORDER";

public static final String EXCHANGE = "directExchange";

@Bean
public Queue failedOrderQueue(){
    return new Queue(FAILED_ORDER_QUEUE_NAME);
}

@Bean
public DirectExchange directExchange(){
    return new DirectExchange(EXCHANGE,true,false);
}

@Bean
public Binding secondBinding(Queue failedOrderQueue, DirectExchange directExchange){
    return BindingBuilder.bind(failedOrderQueue).to(directExchange).with(FAILED_ORDER_QUEUE_NAME);
}

}

Gary Russell · Accepted Answer

This can happen when you are using a non-HA auto-delete queue with an incorrect master locator.

If the master locator is not client-local, the auto-delete queue might be created on a different node to the one we are connected to. In that case, if the host node goes down, you will get this problem.

To avoid this problem with auto-delete queues, set the x-queue-master-locator queue argument to client-local or set a policy on the broker to do the same for queues matching this name.

However, you are not using an auto-delete queue...

@Bean
public Queue failedOrderQueue(){
    return new Queue(FAILED_ORDER_QUEUE_NAME);
}

When using a cluster, and a non-HA queue, the queue is not replicated and so, if the owning node goes down, you will get this error until the owning node comes back up.

To avoid this problem, set a policy to make the queue a mirrored (HA) queue.

https://www.rabbitmq.com/ha.html

RabbitMQ cluster node failure with spring boot application

Answers (1)

Related Questions