Dev
Dev

Reputation: 345

RabbitMQ cluster node failure with spring boot application

I have a spring boot application that is connected to a RabbitMQ cluster (as a service in cloud foundry). When the main node in the cluster fails and for some reason the node does not come up but the application (Message Consumer) was trying to connect to the failed node and does not try to connect to other available nodes. Could someone suggest some spring configurations to fix this issue ?

17:36:23.829: [APP/PROC/WEB.0] Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - home node '[email protected]' of durable queue 'FAILED_ORDER' in vhost '/' is down or inaccessible, class-id=50, method-id=10)

'[email protected]' is the failed node.

In order to continuously try connecting to the nodes on failure, i have the following spring configuration. spring.rabbitmq.listener.simple.missing-queues-fatal=false

@Configuration
public class MessageConfiguration {

public static final String FAILED_ORDER_QUEUE_NAME = "FAILED_ORDER";

public static final String EXCHANGE = "directExchange";

@Bean
public Queue failedOrderQueue(){
    return new Queue(FAILED_ORDER_QUEUE_NAME);
}

@Bean
public DirectExchange directExchange(){
    return new DirectExchange(EXCHANGE,true,false);
}

@Bean
public Binding secondBinding(Queue failedOrderQueue, DirectExchange directExchange){
    return BindingBuilder.bind(failedOrderQueue).to(directExchange).with(FAILED_ORDER_QUEUE_NAME);
}

}

Upvotes: 1

Views: 1575

Answers (1)

Gary Russell
Gary Russell

Reputation: 174739

This can happen when you are using a non-HA auto-delete queue with an incorrect master locator.

If the master locator is not client-local, the auto-delete queue might be created on a different node to the one we are connected to. In that case, if the host node goes down, you will get this problem.

To avoid this problem with auto-delete queues, set the x-queue-master-locator queue argument to client-local or set a policy on the broker to do the same for queues matching this name.

However, you are not using an auto-delete queue...

@Bean
public Queue failedOrderQueue(){
    return new Queue(FAILED_ORDER_QUEUE_NAME);
}

When using a cluster, and a non-HA queue, the queue is not replicated and so, if the owning node goes down, you will get this error until the owning node comes back up.

To avoid this problem, set a policy to make the queue a mirrored (HA) queue.

https://www.rabbitmq.com/ha.html

Upvotes: 2

Related Questions