Reputation: 4944
I have 2 Amazon Linux EC2 instances that are running HAProxy. I want to monitor each instance from the other instance and if a instance becomes unavailable, the other instance will issue a API command to move the elastic IP to the active server.
I created a Bash script to do the monitoring every XX seconds. I need to set the script to run as a service so I created a service wrapper and placed in /etc/init.d based on a template that I found and registered as a service.
The problem is when I issue command #service hamonitor start, it says "Starting hamonitor...", but I never see the OK message and if I issue the stop command, it fails and if I issue the status command, it says it is not running. But, if I check the logs, it shows that the script is in fact running. I assume that I need a proper PID file and/or since the script runs in a infinite loop, it never completes so the OK does not get issued.
Service Wrapper:
#!/bin/sh
#
# /etc/init.d/hamonitor
# Subsystem file for "hamonitor" server
#
# chkconfig: 2345 95 05 (1)
# description: hamonitor server daemon
#
# processname: hamonitor
### BEGIN INIT INFO
# Provides:
# Required-Start:
# Required-Stop:
# Should-Start:
# Should-Stop:
# Default-Start:
# Default-Stop:
# Short-Description:
# Description:
### END INIT INFO
# source function library
. /etc/rc.d/init.d/functions
PROG=hamonitor
EXEC=/etc/haproxy/hamonitor
LOCKFILE=/var/lock/subsys/$prog
PIDFILE=/var/run/$prog.pid
RETVAL=0
start() {
echo -n $"Starting $PROG:"
echo
#daemon $EXEC &
/etc/haproxy/hamonitor &
RETVAL=$?
if [ $RETVAL -eq 0 ]; then
touch LOCKFILE
touch PIDFILE
echo "[ OK ]"
else
echo "[ FAIL: ${retval} ]"
fi
return $RETVAL
}
stop() {
echo -n $"Stopping $PROG:"
echo
killproc $PROG -TERM
RETVAL=$?
if [ $RETVAL -eq 0 ]; then
rm -f LOCKFILE
rm -f PIDFILE
echo "[ OK ]"
else
echo "[ FAIL: ${RETVAL} ]"
fi
return $RETVAL
}
case "$1" in
start)
start
;;
stop)
stop
;;
status)
status $PROG
RETVAL=$?
;;
restart)
stop
start
;;
*)
echo $"Usage: $0 {start|stop|status|restart}"
RETVAL=1
esac
exit $RETVAL
App:
#!/usr/bin/env bash
export EC2_HOME=/opt/aws/apitools/ec2
export JAVA_HOME=/usr/lib/jvm/jre
AWS_ACCESS_KEY="XXXXXXXXXXXXXXXXXXXXXXXXX"
AWS_SECRET_KEY="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
VIP1="1.2.3.4"
VIP1_ALLOCATIONID="eipalloc-XXXXXXX"
THIS_NODE_EC2_ID="i-XXXXXXX"
THIS_NODE_PRIVATE_IPADDRESS1="10.60.0.11"
THIS_NODE_HEALTHCHECK_URL="http://10.60.0.10/haproxy?monitor"
OTHER_NODE_HEALTHCHECK_URL="http://10.60.49.50/haproxy?monitor"
CHECK_OTHER_INTERVAL=5
CHECK_OTHER_FAIL_COUNT=0
CHECK_OTHER_RUN_COUNT=0
AFTER_TAKEOVER_WAIT=30
function takeover_vips {
/opt/aws/bin/ec2-associate-address -aws-access-key ${AWS_ACCESS_KEY} -aws-secret-key ${AWS_SECRET_KEY} -a ${VIP1_ALLOCATIONID} -i ${THIS_NODE_EC2_ID} -private-ip-address ${THIS_NODE_PRIVATE_IPADDRESS1} -allow-reassociation > /dev/null
}
function does_this_node_have_ips {
is_active=$(/opt/aws/bin/ec2-describe-addresses -aws-access-key ${AWS_ACCESS_KEY} -aws-secret-key ${AWS_SECRET_KEY} | grep ${VIP1} | grep ${THIS_NODE_EC2_ID})
if [ "$is_active" = "" ]; then
echo "no"
else
echo "yes"
fi
}
function log_msg {
msg=$1
msg="$(date) -- ${msg}"
echo ${msg} >> /var/log/hamonitorlog
}
while [ . ]; do
healthcheck_response=$(curl -sL -w "%{http_code}" ${OTHER_NODE_HEALTHCHECK_URL} -o /dev/null)
if [ "$healthcheck_response" != "200" ]; then
CHECK_OTHER_FAIL_COUNT=$((CHECK_OTHER_FAIL_COUNT+1))
if [ "$CHECK_OTHER_FAIL_COUNT" -gt 2 ]; then
takeover_vips
CHECK_OTHER_FAIL_COUNT=0
sleep ${AFTER_TAKEOVER_WAIT}
fi
sleep ${CHECK_OTHER_INTERVAL}
done
Upvotes: 3
Views: 5081
Reputation: 22469
Some Linux distribution have up-start and other init; I assume you have init
. The chkconfig
is being used to maintain symlinks. You should confirm the comment,
# chkconfig: 2345 95 05 (1)
is correct for your system.
As a guess, you need daemon
to be invoked via a script. This may have been a script function in some init
script library, like /etc/rc.d/init.d/functions. I would suggest that you use the daemon()
function if it exists. Either,
daemon $EXEC & #option1
nohup /etc/haproxy/hamonitor < /dev/null > /dev/null 2>&1 & #option2
/etc/haproxy/hamonitor& #option3, 2 lines.
disown $! #...
This is related to SIGCHLD and process return status (see man wait
for more). As well, you may need to detach hamonitor
from the controlling terminal. You can use logger
to send information to the system logs in this case; I guess the App script is the hamonitor
code? Just change echo
to logger
.
If the hamonitor
needs stdout, stdin, and/or stderr, you may need to redirect to some other file if it requires it. You might also consider running it via screen
if this is the case.
Edit: The last option can be used to create a proper PIDFILE
. For instance,
# !!! optional grabbing of lock here...
/etc/haproxy/hamonitor & # spawn in bg
HA_PID=$! # record spawn pid
echo $HA_PID > $PIDFILE # record the PID to a file for `stop`.
# !!! optional release of lock here...
disown $HA_PID # detach script from terminal.
Services should never use echo
and the like; logger
is the better option. This is probably not your issue unless hamonitor
tries to read from something. Mainly the issue is that start()
will wait for the hamonitor
to finish if you don't disown
, so the rc script's start will never finish.
Generically, you can look at /etc/rc.d/init.d/functions, provide a link to your file, or provide your distribution and version (or at least linux standard base conformance which seems to define how this should work in its different versions). The file can be different on each and every Linux. You can look at this file yourself if you understand scripting to see what environment variables, files, etc are expected and what functions you use in this file. For instance, killproc
is most likely defined there.
Upvotes: 3