Reputation: 11
I'm hitting a replication issue on a three-node Disque cluster, it seems weird because the use case is fairly typical so it's entirely possible I'm doing something wrong.
This is how to reproduce it locally:
# relevant disque info
disque_version:1.0-rc1
disque_git_sha1:0192ba7e
disque_git_dirty:0
disque_build_id:b02910aa5c47590a
Start 3 disque nodes in ports 9001, 9002 and 9003, and then have servers on port 9002 and 9003 meet with 9001.
127.0.0.1:9002> CLUSTER MEET 127.0.0.1 9001 #=> OK
127.0.0.1:9003> CLUSTER MEET 127.0.0.1 9001 #=> OK
The HELLO
reports the same data for all three nodes, as expected.
127.0.0.1:9003> hello
1) (integer) 1
2) "e93cbbd17ad12369dd2066a55f9d4c51be9c93dd"
3) 1) "b61c63e8fd0c67544f895f5d045aa832ccb47e08"
2) "127.0.0.1"
3) "9001"
4) "1"
4) 1) "b32eb6501e272a06d4c20a1459260ceba658b5cd"
2) "127.0.0.1"
3) "9002"
4) "1"
5) 1) "e93cbbd17ad12369dd2066a55f9d4c51be9c93dd"
2) "127.0.0.1"
3) "9003"
4) "1"
Enqueuing a job succeeds, but the job does not show on on either QLEN
or QPEEK
in the other nodes.
127.0.0.1:9001> addjob myqueue body 1 #=> D-b61c63e8-IFA29ufvL37FRVjVVWisbO/x-05a1
127.0.0.1:9001> qlen myqueue #=> 1
127.0.0.1:9002> qlen myqueue #=> 0
127.0.0.1:9002> qpeek myqueue 1 #=> (empty list or set)
127.0.0.1:9003> qlen myqueue #=> 0
127.0.0.1:9003> qpeek myqueue 1 #=> (empty list or set)
When explicitly setting a replication value higher than the amount of nodes, disque fails with a NOREPL
as one would expect, an explicit replication level of 2 succeeds, but the jobs are still nowhere to be seen in nodes 9002 and 9003. The same behavior happens regardless of the node in which I add the job.
My understanding is that replication happens synchronously when calling ADDJOB
(unless explicitly using ASYNC
), but it doesn't seem to be working properly, the test suite is passing in the master
branch so I'm hitting a wall here and will have to dig into the source code, any help will be greatly appreciated!
Upvotes: 1
Views: 83
Reputation: 890
The job is replicated, but it's enqueued only in one node. Try killing the first node to see the job enqueued in a different one.
Upvotes: 0