Reputation: 4482
I am trying to connect a cluster of 8 replicas on one address to an existing cluster on another address.
The replica servers are all running in cluster mode.
When I try to do either:
./redis-trib.rb add-node --slave REPLICA_IP:6380 MASTER_IP:6380
or
./redis-cli --cluster add-node REPLICA_IP:6380 MASTER_IP:6380 --cluster-slave
I get the same result;
Waiting for the cluster to join...........................
which hangs indefinitely.
The two servers can definitely see each other and I can connect to any relevant redis-node (replica or master) from either server. The discovery/communion ports (16830, etc.) are all open and contactable as well. The output of these commands also suggests that the cluster has been found as it shows each of the nodes and their correct node ids.
here is the full output of either add-node command:
>>> Adding node REPLICA_IP:6380 to cluster MASTER_IP:6380
>>> Performing Cluster Check (using node MASTER_IP:6380)
M: 043a5fa4fdca929d3d87f953906dc7c1f030926c MASTER_IP:6380
slots:[0-2047] (2048 slots) master
M: e104777d31630eef11a01e41c7d3a6c98e14ab64 MASTER_IP:6386
slots:[12288-14335] (2048 slots) master
M: 9c807d6f57a9634adcdf75fa1943c32c985bda1c MASTER_IP:6384
slots:[8192-10239] (2048 slots) master
M: 0f7ec07deff97ca23fe67109da2365d916ff1a67 MASTER_IP:6383
slots:[6144-8191] (2048 slots) master
M: 974e8b4051b7a8e33db62ba7ad62c7e54abe699d MASTER_IP:6382
slots:[4096-6143] (2048 slots) master
M: b647bb9d732ff2ee83b097ffb8b49fb2bccd366f MASTER_IP:6387
slots:[14336-16383] (2048 slots) master
M: a86ac1d5e783bed133b153e471fdd970c17c6af5 MASTER_IP:6381
slots:[2048-4095] (2048 slots) master
M: 6f859b03f86eded0188ba493063c5c2114d7c11f MASTER_IP:6385
slots:[10240-12287] (2048 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
Automatically selected master MASTER_IP:6380
>>> Send CLUSTER MEET to node REPLICA_IP:6380 to make it join the cluster.
Waiting for the cluster to join
............................
If I run CLUSTER MEET
manually and then CLUSTER NODES
I can temporarily see another node in state 'handshake' with status 'disconnected' and then it disappears. It shows a node id that is not the same as it actually is.
Upvotes: 5
Views: 9296
Reputation: 2022
If there is no firewall problem between the nodes, you may check bind setting in redis.conf.
You should bind the redis service on LAN IP, of course, but one more thing:
Delete 127.0.0.1
or move 127.0.0.1
to the end after LAN IP!
Just like this: bind 10.2.1.x 127.0.0.1
or bind 10.2.1.x
Upvotes: 0
Reputation: 1086
In my case each node had same MMID so it was waiting forever.
What I was doing, I configured an EC2 ami, and launched 3 servers from AMI and using user-data I did reconfigured the redis cluster via shell script and restarted the server, each server got same ID as from which server I had created AMI.
M: b29aff425cdfa94272cdce1816939a9692c71e12 10.0.134.109:6379
slots:[0-5460] (5461 slots) master
M: b29aff425cdfa94272cdce1816939a9692c71e12 10.0.175.235:6379
slots:[5461-10922] (5462 slots) master
M: b29aff425cdfa94272cdce1816939a9692c71e12 10.0.155.10:6379
slots:[10923-16383] (5461 slots) master
Can I set the above configuration? (type 'yes' to accept): yes
So on each node I did CLUSTER RESET HARD
; it works
https://redis.io/commands/cluster-reset
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
M: 36a129fab85d2aed310bfd7cc141035de420fa92 10.0.134.109:6379
slots:[0-5460] (5461 slots) master
M: 773bc76e903da27efbd965bca26366fa20878397 10.0.175.235:6379
slots:[5461-10922] (5462 slots) master
M: 10a79173d1f7a9c568bdfa3b955b6e133d2dceaa 10.0.155.10:6379
slots:[10923-16383] (5461 slots) master
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
..
>>> Performing Cluster Check (using node 10.0.134.109:6379)
M: 36a129fab85d2aed310bfd7cc141035de420fa92 10.0.134.109:6379
slots:[0-5460] (5461 slots) master
M: 773bc76e903da27efbd965bca26366fa20878397 10.0.175.235:6379
slots:[5461-10922] (5462 slots) master
M: 10a79173d1f7a9c568bdfa3b955b6e133d2dceaa 10.0.155.10:6379
slots:[10923-16383] (5461 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
Upvotes: 1
Reputation: 4482
I figured it out:
Using tcpdump I confirmed that both servers were talking to each other on both the redis server ports and the handshake ports repeatedly while the add-slave command hung forever.
but in the redis configs for each node I had:
bind 0.0.0.0
but on both the masters and the replicas the config must read:
bind SERVER_IP
in order for CLUSTER MEET to work properly.
Upvotes: 3