jpshook
jpshook

Reputation: 4944

How to run my looping Bash script as a service?

I have 2 Amazon Linux EC2 instances that are running HAProxy. I want to monitor each instance from the other instance and if a instance becomes unavailable, the other instance will issue a API command to move the elastic IP to the active server.

I created a Bash script to do the monitoring every XX seconds. I need to set the script to run as a service so I created a service wrapper and placed in /etc/init.d based on a template that I found and registered as a service.

The problem is when I issue command #service hamonitor start, it says "Starting hamonitor...", but I never see the OK message and if I issue the stop command, it fails and if I issue the status command, it says it is not running. But, if I check the logs, it shows that the script is in fact running. I assume that I need a proper PID file and/or since the script runs in a infinite loop, it never completes so the OK does not get issued.

Service Wrapper:

#!/bin/sh
#
# /etc/init.d/hamonitor
# Subsystem file for "hamonitor" server
#
# chkconfig: 2345 95 05 (1)
# description: hamonitor server daemon
#
# processname: hamonitor

### BEGIN INIT INFO
# Provides: 
# Required-Start: 
# Required-Stop: 
# Should-Start: 
# Should-Stop: 
# Default-Start: 
# Default-Stop: 
# Short-Description: 
# Description:      
### END INIT INFO

# source function library
. /etc/rc.d/init.d/functions

PROG=hamonitor
EXEC=/etc/haproxy/hamonitor
LOCKFILE=/var/lock/subsys/$prog
PIDFILE=/var/run/$prog.pid
RETVAL=0

start() {
    echo -n $"Starting $PROG:"
    echo
    #daemon $EXEC &
    /etc/haproxy/hamonitor &
    RETVAL=$?
    if [ $RETVAL -eq 0 ]; then
      touch LOCKFILE
      touch PIDFILE
      echo "[ OK ]"
    else
      echo "[ FAIL: ${retval} ]"
    fi
    return $RETVAL
}

stop() {
    echo -n $"Stopping $PROG:"
    echo 
    killproc $PROG -TERM
    RETVAL=$?
    if [ $RETVAL -eq 0 ]; then
      rm -f LOCKFILE
      rm -f PIDFILE
      echo "[ OK ]"
    else
      echo "[ FAIL: ${RETVAL} ]"
    fi
    return $RETVAL
}

case "$1" in
  start)
    start
    ;;
  stop)
    stop
    ;;
  status)
    status $PROG
        RETVAL=$?
    ;;
  restart)
    stop
    start
    ;;
  *)
    echo $"Usage: $0 {start|stop|status|restart}"
    RETVAL=1
esac

exit $RETVAL

App:

#!/usr/bin/env bash

export EC2_HOME=/opt/aws/apitools/ec2
export JAVA_HOME=/usr/lib/jvm/jre

AWS_ACCESS_KEY="XXXXXXXXXXXXXXXXXXXXXXXXX"
AWS_SECRET_KEY="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
VIP1="1.2.3.4"
VIP1_ALLOCATIONID="eipalloc-XXXXXXX"
THIS_NODE_EC2_ID="i-XXXXXXX"
THIS_NODE_PRIVATE_IPADDRESS1="10.60.0.11" 
THIS_NODE_HEALTHCHECK_URL="http://10.60.0.10/haproxy?monitor"
OTHER_NODE_HEALTHCHECK_URL="http://10.60.49.50/haproxy?monitor"
CHECK_OTHER_INTERVAL=5
CHECK_OTHER_FAIL_COUNT=0
CHECK_OTHER_RUN_COUNT=0
AFTER_TAKEOVER_WAIT=30

function takeover_vips {
  /opt/aws/bin/ec2-associate-address -aws-access-key ${AWS_ACCESS_KEY} -aws-secret-key ${AWS_SECRET_KEY} -a ${VIP1_ALLOCATIONID} -i ${THIS_NODE_EC2_ID} -private-ip-address ${THIS_NODE_PRIVATE_IPADDRESS1} -allow-reassociation > /dev/null
}

function does_this_node_have_ips {
  is_active=$(/opt/aws/bin/ec2-describe-addresses -aws-access-key ${AWS_ACCESS_KEY} -aws-secret-key ${AWS_SECRET_KEY}  | grep ${VIP1} | grep ${THIS_NODE_EC2_ID})
  if [ "$is_active" = "" ]; then
    echo "no"
  else
    echo "yes"
  fi
}

function log_msg {
  msg=$1
  msg="$(date) -- ${msg}"
  echo ${msg} >> /var/log/hamonitorlog
}

while [ . ]; do
    healthcheck_response=$(curl -sL -w "%{http_code}" ${OTHER_NODE_HEALTHCHECK_URL} -o /dev/null) 
    if [ "$healthcheck_response" != "200" ]; then
        CHECK_OTHER_FAIL_COUNT=$((CHECK_OTHER_FAIL_COUNT+1))
        if [ "$CHECK_OTHER_FAIL_COUNT" -gt 2 ]; then
          takeover_vips
          CHECK_OTHER_FAIL_COUNT=0
          sleep ${AFTER_TAKEOVER_WAIT}
        fi
    sleep ${CHECK_OTHER_INTERVAL}
done

Upvotes: 3

Views: 5081

Answers (1)

artless-noise-bye-due2AI
artless-noise-bye-due2AI

Reputation: 22469

Some Linux distribution have up-start and other init; I assume you have init. The chkconfig is being used to maintain symlinks. You should confirm the comment,

# chkconfig: 2345 95 05 (1)

is correct for your system.

As a guess, you need daemon to be invoked via a script. This may have been a script function in some init script library, like /etc/rc.d/init.d/functions. I would suggest that you use the daemon() function if it exists. Either,

  daemon $EXEC &                                               #option1
  nohup /etc/haproxy/hamonitor < /dev/null > /dev/null 2>&1 &  #option2
  /etc/haproxy/hamonitor&                                      #option3, 2 lines.
  disown $!                                                    #...

This is related to SIGCHLD and process return status (see man wait for more). As well, you may need to detach hamonitor from the controlling terminal. You can use logger to send information to the system logs in this case; I guess the App script is the hamonitor code? Just change echo to logger.

If the hamonitor needs stdout, stdin, and/or stderr, you may need to redirect to some other file if it requires it. You might also consider running it via screen if this is the case.

Edit: The last option can be used to create a proper PIDFILE. For instance,

  # !!! optional grabbing of lock here...
  /etc/haproxy/hamonitor &   # spawn in bg
  HA_PID=$!                  # record spawn pid
  echo $HA_PID > $PIDFILE    # record the PID to a file for `stop`.
  # !!! optional release of lock here...
  disown $HA_PID             # detach script from terminal.

Services should never use echo and the like; logger is the better option. This is probably not your issue unless hamonitor tries to read from something. Mainly the issue is that start() will wait for the hamonitor to finish if you don't disown, so the rc script's start will never finish.

Generically, you can look at /etc/rc.d/init.d/functions, provide a link to your file, or provide your distribution and version (or at least linux standard base conformance which seems to define how this should work in its different versions). The file can be different on each and every Linux. You can look at this file yourself if you understand scripting to see what environment variables, files, etc are expected and what functions you use in this file. For instance, killproc is most likely defined there.

Upvotes: 3

Related Questions