Reputation: 4482

redis-cluster - add-node slave to existing cluster from remote machine hanging forever

I am trying to connect a cluster of 8 replicas on one address to an existing cluster on another address.

The replica servers are all running in cluster mode.

When I try to do either:

./redis-trib.rb add-node --slave REPLICA_IP:6380 MASTER_IP:6380

./redis-cli --cluster add-node REPLICA_IP:6380 MASTER_IP:6380 --cluster-slave

I get the same result;

Waiting for the cluster to join...........................

which hangs indefinitely.

The two servers can definitely see each other and I can connect to any relevant redis-node (replica or master) from either server. The discovery/communion ports (16830, etc.) are all open and contactable as well. The output of these commands also suggests that the cluster has been found as it shows each of the nodes and their correct node ids.

here is the full output of either add-node command:

>>> Adding node REPLICA_IP:6380 to cluster MASTER_IP:6380
>>> Performing Cluster Check (using node MASTER_IP:6380)
M: 043a5fa4fdca929d3d87f953906dc7c1f030926c MASTER_IP:6380
   slots:[0-2047] (2048 slots) master
M: e104777d31630eef11a01e41c7d3a6c98e14ab64 MASTER_IP:6386
   slots:[12288-14335] (2048 slots) master
M: 9c807d6f57a9634adcdf75fa1943c32c985bda1c MASTER_IP:6384
   slots:[8192-10239] (2048 slots) master
M: 0f7ec07deff97ca23fe67109da2365d916ff1a67 MASTER_IP:6383
   slots:[6144-8191] (2048 slots) master
M: 974e8b4051b7a8e33db62ba7ad62c7e54abe699d MASTER_IP:6382
   slots:[4096-6143] (2048 slots) master
M: b647bb9d732ff2ee83b097ffb8b49fb2bccd366f MASTER_IP:6387
   slots:[14336-16383] (2048 slots) master
M: a86ac1d5e783bed133b153e471fdd970c17c6af5 MASTER_IP:6381
   slots:[2048-4095] (2048 slots) master
M: 6f859b03f86eded0188ba493063c5c2114d7c11f MASTER_IP:6385
   slots:[10240-12287] (2048 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
Automatically selected master MASTER_IP:6380
>>> Send CLUSTER MEET to node REPLICA_IP:6380 to make it join the cluster.
Waiting for the cluster to join
............................

If I run CLUSTER MEET manually and then CLUSTER NODES I can temporarily see another node in state 'handshake' with status 'disconnected' and then it disappears. It shows a node id that is not the same as it actually is.

Upvotes: 5

Answers (3)

Garvit Jain

Reputation: 2022

If there is no firewall problem between the nodes, you may check bind setting in redis.conf.

You should bind the redis service on LAN IP, of course, but one more thing:

Delete 127.0.0.1 or move 127.0.0.1 to the end after LAN IP!

Just like this: bind 10.2.1.x 127.0.0.1 or bind 10.2.1.x

source

Upvotes: 0

Ramratan Gupta

Reputation: 1086

In my case each node had same MMID so it was waiting forever.

What I was doing, I configured an EC2 ami, and launched 3 servers from AMI and using user-data I did reconfigured the redis cluster via shell script and restarted the server, each server got same ID as from which server I had created AMI.

M: b29aff425cdfa94272cdce1816939a9692c71e12 10.0.134.109:6379
   slots:[0-5460] (5461 slots) master
M: b29aff425cdfa94272cdce1816939a9692c71e12 10.0.175.235:6379
   slots:[5461-10922] (5462 slots) master
M: b29aff425cdfa94272cdce1816939a9692c71e12 10.0.155.10:6379
   slots:[10923-16383] (5461 slots) master
Can I set the above configuration? (type 'yes' to accept): yes

So on each node I did CLUSTER RESET HARD; it works

https://redis.io/commands/cluster-reset

Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
M: 36a129fab85d2aed310bfd7cc141035de420fa92 10.0.134.109:6379
   slots:[0-5460] (5461 slots) master
M: 773bc76e903da27efbd965bca26366fa20878397 10.0.175.235:6379
   slots:[5461-10922] (5462 slots) master
M: 10a79173d1f7a9c568bdfa3b955b6e133d2dceaa 10.0.155.10:6379
   slots:[10923-16383] (5461 slots) master
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
..
>>> Performing Cluster Check (using node 10.0.134.109:6379)
M: 36a129fab85d2aed310bfd7cc141035de420fa92 10.0.134.109:6379
   slots:[0-5460] (5461 slots) master
M: 773bc76e903da27efbd965bca26366fa20878397 10.0.175.235:6379
   slots:[5461-10922] (5462 slots) master
M: 10a79173d1f7a9c568bdfa3b955b6e133d2dceaa 10.0.155.10:6379
   slots:[10923-16383] (5461 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

Upvotes: 1

Christopher Reid

Reputation: 4482

I figured it out:

Using tcpdump I confirmed that both servers were talking to each other on both the redis server ports and the handshake ports repeatedly while the add-slave command hung forever.

but in the redis configs for each node I had:

bind 0.0.0.0

but on both the masters and the replicas the config must read:

bind SERVER_IP

in order for CLUSTER MEET to work properly.

Upvotes: 3

redis-cluster - add-node slave to existing cluster from remote machine hanging forever

Answers (3)

Related Questions