Kuldeep Singh
Kuldeep Singh

Reputation: 1274

redis sentinel failover not happening in docker swarm

I have been trying to setup redis in sentinel mode using docker-compose file. Below are the contents of my compose file -

version: '3.3'
services:
  redis-master:
    image: redis:latest
    deploy:
      replicas: 1
    networks:
      - Overlay_Network

  redis-slave:
    image: redis:latest
    command: redis-server --slaveof redis-master 6379
    depends_on:
      - redis-master
    deploy:
      replicas: 2
    networks:
      - Overlay_Network

  sentinel:
    image: sentinel:latest
    environment:
      - SENTINEL_DOWN_AFTER=5000
      - SENTINEL_FAILOVER=5000
      - REDIS_MASTER=redis-master
    depends_on:
      - redis-master
      - redis-slave
    deploy:
      replicas: 3
    networks:
      - Overlay_Network

networks:
 Overlay_Network:
    external:
      name: Overlay_Network

Here I am creating three services redis-master, redis-slave and sentinel(local docker image used that starts redis in sentinel mode based on passed env variables). I followed this for creating sentinel image https://gitlab.ethz.ch/amiv/redis-cluster/tree/master

When I use docker-compose to run the services. It works fine.

docker-compose -f docker-compose.yml up -d

It starts all services with single instance of each. Later I manually scale redis-slave to 2 instances and sentinel to 3 instances. Then when I stop the container for redis-master, sentinel notices it and make one of slave node as master. It is working as expected.

The issue happens when I run it in swarm mode using docker stack deploy command using the same compose file.

docker stack deploy -c docker-compose.yml <stack-name>

It starts all the services, 1 instance for redis-master, 2 for redis-slave and 3 for sentinel. It uses overlay network. When I stop container for redis-master, sentinel could not upgrade any of slave nodes to master mode. Seems sentinel could not add and notice slave nodes. It adds and then it shows in down status. Here is snippet from sentinel log file.

1:X 04 Jul 2019 14:31:36.465 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 04 Jul 2019 14:31:36.465 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 04 Jul 2019 14:31:36.465 # Configuration loaded
1:X 04 Jul 2019 14:31:36.466 * Running mode=sentinel, port=26379.
1:X 04 Jul 2019 14:31:36.466 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:X 04 Jul 2019 14:31:36.468 # Sentinel ID is e84a635f6cf4c0ee4454922a557a7c0fba00fadd
1:X 04 Jul 2019 14:31:36.468 # +monitor master mymaster 10.0.22.123 6379 quorum 2
1:X 04 Jul 2019 14:31:36.469 * +slave slave 10.0.22.125:6379 10.0.22.125 6379 @ mymaster 10.0.22.123 6379
1:X 04 Jul 2019 14:31:38.423 * +sentinel sentinel f92b9499bff409558a2eb985ef949dfc7050c528 10.0.22.130 26379 @ mymaster 10.0.22.123 6379
1:X 04 Jul 2019 14:31:38.498 * +sentinel sentinel 6e32d6bfea4142a0bc77a74efdfd24424cbe026b 10.0.22.131 26379 @ mymaster 10.0.22.123 6379
1:X 04 Jul 2019 14:31:41.538 # +sdown slave 10.0.22.125:6379 10.0.22.125 6379 @ mymaster 10.0.22.123 6379

I thought it could be due to start order of containers. But depends_on field is not valid for stack mode and I could not find any other way to define the start order in stack mode.

When I do docker network inspect for overlay network, here is the output

"Containers": {
    "57b7620ef75956464ce274e66e60c9cb5a9d8b79486c5b80016db4482126916b": {
        "Name": "sws_sentinel.3.y8sdpj8609ilq22xinzykbxkm",
        "EndpointID": "a95ab07b07c68a32227be3b5da4d378b82f24aab4279bfaa13899a2a7184ce09",
        "MacAddress": "02:42:0a:00:16:84",
        "IPv4Address": "10.0.22.132/24",
        "IPv6Address": ""
    },
    "982222f1b87e1483ec791f382678ef02abcdffe74a5df13a0c0476f7f3a599a7": {
        "Name": "sws_redis-slave.1.uxwkndhkdnizyicwulzli964r",
        "EndpointID": "f5f8fa056622b1529351355c3760c3f45357c7b3de3fe4d2ee90e2d490328f2a",
        "MacAddress": "02:42:0a:00:16:80",
        "IPv4Address": "10.0.22.128/24",
        "IPv6Address": ""
    },
    "c55376217215a1c11b62ac9d22d28eaa1bcda89484a0202b208e557feea4dd35": {
        "Name": "sws_redis-slave.2.s8ha5xmvx6sue2pj6fav8bcbx",
        "EndpointID": "6dcb13e23a8b4c0b49d7dc41e5813b317b8d67377ac30a476261108b8cdeb3f8",
        "MacAddress": "02:42:0a:00:16:7f",
        "IPv4Address": "10.0.22.127/24",
        "IPv6Address": ""
    },
    "cd6d72547ef3fb34ece45ad0201555124505379182f7445373025e1b9a115554": {
        "Name": "sws_redis-master.1.3rhfihzqip2a44xq2uerhqkjt",
        "EndpointID": "9074f9c911e03de0f27e4fb6b75afdf6bb38a111a511738451feb5e64c8dbff3",
        "MacAddress": "02:42:0a:00:16:7c",
        "IPv4Address": "10.0.22.124/24",
        "IPv6Address": ""
    },
    "lb-SA_Monitor_Overlay": {
        "Name": "SA_Monitor_Overlay-endpoint",
        "EndpointID": "2fb84ac75f5eee015b80b55713da83d1afb7dfa7ed4c1f5eda170f4b8daf8884",
        "MacAddress": "02:42:0a:00:16:7d",
        "IPv4Address": "10.0.22.125/24",
        "IPv6Address": ""
    }
}

Here I see slaves are running on ip 10.0.22.128 and 10.0.22.127, but in sentinel log file it is trying to add slave using ip 10.0.22.125. Why is that? Could this be an issue?

Let me know if any more detail is required.

Upvotes: 1

Views: 1919

Answers (1)

Kuldeep Singh
Kuldeep Singh

Reputation: 1274

I concluded that it was happening due to docker swarm default load balancer. Sentinel gets information about slaves from master node. But slaves are not getting registered with their actual IP address in docker network. It seems to be load balanced IP. So sentinel was not able to reach slaves using that IP and it shows slave is down.

They have also mentioned it on their documentation page

https://redis.io/topics/replication [Configuring replication in Docker and NAT]

https://redis.io/topics/sentinel [Sentinel, Docker, NAT, and possible issues]

As a solution to this, I made my custom Dockerfile to start redis-slave nodes. It uses redis.conf and an entrypoint.sh script. In entrypoint.sh I get the container's real IP and write it to redis.conf and as last step, start redis-server using that updated redis.conf.

slave-announce-ip <CONTAINER_IP_ADDRESS>
slave-announce-port 6379

You can also do similar steps for sentinel nodes.

Now slaves will be registered using their real conatiner IP address, port and sentinel is able to communicate with them.

Upvotes: 1

Related Questions