pote
pote

Reputation: 11

Replication issue on basic 3-node Disque cluster

I'm hitting a replication issue on a three-node Disque cluster, it seems weird because the use case is fairly typical so it's entirely possible I'm doing something wrong.

This is how to reproduce it locally:

# relevant disque info
disque_version:1.0-rc1
disque_git_sha1:0192ba7e
disque_git_dirty:0
disque_build_id:b02910aa5c47590a

Start 3 disque nodes in ports 9001, 9002 and 9003, and then have servers on port 9002 and 9003 meet with 9001.

127.0.0.1:9002> CLUSTER MEET 127.0.0.1 9001 #=> OK

127.0.0.1:9003> CLUSTER MEET 127.0.0.1 9001 #=> OK

The HELLO reports the same data for all three nodes, as expected.

127.0.0.1:9003> hello
1) (integer) 1
2) "e93cbbd17ad12369dd2066a55f9d4c51be9c93dd"
3) 1) "b61c63e8fd0c67544f895f5d045aa832ccb47e08"
   2) "127.0.0.1"
   3) "9001"
   4) "1"
4) 1) "b32eb6501e272a06d4c20a1459260ceba658b5cd"
   2) "127.0.0.1"
   3) "9002"
   4) "1"
5) 1) "e93cbbd17ad12369dd2066a55f9d4c51be9c93dd"
   2) "127.0.0.1"
   3) "9003"
   4) "1"

Enqueuing a job succeeds, but the job does not show on on either QLEN or QPEEK in the other nodes.

127.0.0.1:9001> addjob myqueue body 1 #=> D-b61c63e8-IFA29ufvL37FRVjVVWisbO/x-05a1
127.0.0.1:9001> qlen myqueue          #=> 1

127.0.0.1:9002> qlen myqueue          #=> 0
127.0.0.1:9002> qpeek myqueue 1       #=> (empty list or set)

127.0.0.1:9003> qlen myqueue          #=> 0
127.0.0.1:9003> qpeek myqueue 1       #=> (empty list or set)

When explicitly setting a replication value higher than the amount of nodes, disque fails with a NOREPL as one would expect, an explicit replication level of 2 succeeds, but the jobs are still nowhere to be seen in nodes 9002 and 9003. The same behavior happens regardless of the node in which I add the job.

My understanding is that replication happens synchronously when calling ADDJOB (unless explicitly using ASYNC), but it doesn't seem to be working properly, the test suite is passing in the master branch so I'm hitting a wall here and will have to dig into the source code, any help will be greatly appreciated!

Upvotes: 1

Views: 83

Answers (1)

soveran
soveran

Reputation: 890

The job is replicated, but it's enqueued only in one node. Try killing the first node to see the job enqueued in a different one.

Upvotes: 0

Related Questions