Reputation: 1
I am trying to configure keepalived in such a way that if any application or service running on master node fails, keepalived should consider it as fault and backup node should act as master and take over the floating IP from master node.
I have written a script to check if service X on master server goes down, then it should transition to backup node.
My keepalived conf are:
global_defs {
enable_script_security
}
vrrp_script keepalived_check {
script "/root/new/check.sh"
interval 1
timeout 1
rise 2
fall 2
weight 0 reverse
}
vrrp_instance V1_11 {
state MASTER
interface ens3
virtual_router_id 51
priority 101
advert_int 1
unicast_src_ip 192.168.10.129
unicast_peer {
192.168.10.130
}
authentication {
auth_type PASS
auth_pass 1122
}
virtual_ipaddress {
192.168.10.231/24
}
track_script {
keepalived_check
}
}
The Script that checks the service status:
#!/bin/bash
var="$(systemctl is-active myservice.service)"
if [ $var == "active" ]
then
echo 0
else
echo 5
fi
I manually stopped the "myservice" using
systemctl stop myservice.service
The output for script is 5
as expected. But with above mentioned configurations, master node remains as primary node and it doesn't shift ownership to backup node. Is there any particular config that I have missed then kindly help me find that?
Upvotes: 0
Views: 2162
Reputation: 153
The return code for your script is always 0
(a success), as echo
is successfully writing your value (0 or 5) to the console. Change each echo
to exit
(e.g. exit 5
) to return the stated code.
Check the return code by running your script, then running echo $?
.
Reverse is not needed. From the keepalived manual, weight 0 reverse
will:
'weight 0 reverse' will cause the vrrp instance to be down when the script is up, and vice versa.
As your script should return 0
on success and a non-zero on failure, weight 0 reverse
will cause the node to be up when your script returns non-zero (5
). You want weight 0
instead, which will:
The default weight equals 0, which means that any VRRP instance monitoring the script will transition to the fault state after <fall> consecutive failures of the script.
SELinux will also block your scripts from running unless they have the correct context, so it's likely that the MASTER node would not demote itself to BACKUP, as the check is failing on both/all nodes, and the MASTER node has the highest priority (101
in this case).
The keepalived
package in RHEL creates the directory /usr/libexec/keepalived/
for these types of scripts, with the appropriate SELinux rules/contexts.
Do something like the following:
check.sh
script to /usr/libexec/keepalived/
and restore the SELinux context using restorecon -Rv /usr/libexec/keepalived/
script
line to script "/usr/libexec/keepalived/check.sh"
echo
lines to exit
lines, to return the codes you wantsystemctl restart keepalived.service
to reload the configurationjournalctl -u keepalived.service
to see the node entering MASTER state:
Script `keepalived_check` now returning 0
VRRP_Script(keepalived_check) succeeded
(V1_11) Entering MASTER STATE
systemctl stop myservice.service
journalctl -u keepalived.service
to see the node entering FAULT state (as opposed to BACKUP state, where the node is running but is not the highest priority):
Script `keepalived_check` now returning 1
VRRP_Script(keepalived_check) failed (exited with status 5)
(V1_11) Entering FAULT STATE
Upvotes: 1