Muhammad Munir
Muhammad Munir

Reputation: 1

Keepalived: Transitioning to backup node not happening when check script on master server finds service is down for an applicaiton

I am trying to configure keepalived in such a way that if any application or service running on master node fails, keepalived should consider it as fault and backup node should act as master and take over the floating IP from master node.

I have written a script to check if service X on master server goes down, then it should transition to backup node.

My keepalived conf are:

global_defs {
    enable_script_security
}
vrrp_script keepalived_check {
    script "/root/new/check.sh"
    interval 1
    timeout 1
    rise 2
    fall 2
    weight 0 reverse
}
vrrp_instance V1_11 {
    state MASTER
    interface ens3
    virtual_router_id 51
    priority 101
    advert_int 1
    unicast_src_ip 192.168.10.129
    unicast_peer {
        192.168.10.130
    }
    authentication {
        auth_type PASS
        auth_pass 1122
    }
    virtual_ipaddress {
        192.168.10.231/24
    }
    track_script {
        keepalived_check
    }
}

The Script that checks the service status:

#!/bin/bash
var="$(systemctl is-active myservice.service)"

if [ $var == "active" ]
then
    echo 0
else
    echo 5
fi

I manually stopped the "myservice" using

systemctl stop myservice.service

The output for script is 5 as expected. But with above mentioned configurations, master node remains as primary node and it doesn't shift ownership to backup node. Is there any particular config that I have missed then kindly help me find that?

Upvotes: 0

Views: 2162

Answers (1)

Mortein
Mortein

Reputation: 153

The return code for your script is always 0 (a success), as echo is successfully writing your value (0 or 5) to the console. Change each echo to exit (e.g. exit 5) to return the stated code.

Check the return code by running your script, then running echo $?.

Reverse is not needed. From the keepalived manual, weight 0 reverse will:

'weight 0 reverse' will cause the vrrp instance to be down when the script is up, and vice versa.

As your script should return 0 on success and a non-zero on failure, weight 0 reverse will cause the node to be up when your script returns non-zero (5). You want weight 0 instead, which will:

The default weight equals 0, which means that any VRRP instance monitoring the script will transition to the fault state after <fall> consecutive failures of the script.

SELinux will also block your scripts from running unless they have the correct context, so it's likely that the MASTER node would not demote itself to BACKUP, as the check is failing on both/all nodes, and the MASTER node has the highest priority (101 in this case).

The keepalived package in RHEL creates the directory /usr/libexec/keepalived/ for these types of scripts, with the appropriate SELinux rules/contexts.


Do something like the following:

  1. Move your check.sh script to /usr/libexec/keepalived/ and restore the SELinux context using restorecon -Rv /usr/libexec/keepalived/
  2. Update your script line to script "/usr/libexec/keepalived/check.sh"
  3. Switch your echo lines to exit lines, to return the codes you want
  4. Restart keepalived using systemctl restart keepalived.service to reload the configuration
  5. Use journalctl -u keepalived.service to see the node entering MASTER state:
    Script `keepalived_check` now returning 0
    VRRP_Script(keepalived_check) succeeded
    (V1_11) Entering MASTER STATE
    
  6. Stop your "myservice" using systemctl stop myservice.service
  7. Use journalctl -u keepalived.service to see the node entering FAULT state (as opposed to BACKUP state, where the node is running but is not the highest priority):
    Script `keepalived_check` now returning 1
    VRRP_Script(keepalived_check) failed (exited with status 5)
    (V1_11) Entering FAULT STATE
    

Upvotes: 1

Related Questions