Maximilien Belinga
Maximilien Belinga

Reputation: 3186

Spark as a Linux Service

I've been tasked to deploy spark into a production environment. I typically manage everything with Ansible. I've packaged up zookeeper and kafka and can deploy those as linux services, but Spark I'm having problems with.

It just doesn't seem setup to be started/stopped as a service (referring to init.d services). Is anyone running spark in cluster mode and do you have it setup to start/stop via an init.d script? Or what's the general consensus on how to set this up?

This is what I already tried before:

spark init.d service:

#!/bin/bash

SPARK_BASE_DIR=/opt/spark-2.0.0-bin-hadoop2.7
SPARK_SBIN=$SPARK_BASE_DIR/sbin
PID=''

if [ -f $SPARK_BASE_DIR/conf/spark-env.sh  ];then
    source $SPARK_BASE_DIR/conf/spark-env.sh
else
    echo "$SPARK_BASE_DIR/conf/spark-env.sh does not exist. Can't run script."
    exit 1
fi


check_status() {

    PID=$(ps ax | grep 'org.apache.spark.deploy.master.Master' | grep java | grep -v grep | awk '{print $1}')

    if [ -n "$PID" ]
    then
    return 1
    else
    return 0
    fi

}

start() {

    check_status

    if [ "$?" -ne 0 ]
    then
    echo "Master already running"
    exit 1
    fi

    echo -n "Starting master and workers ...  "

    su user -c "$SPARK_SBIN/start-all.sh" spark  &>/dev/null

    sleep 5

    check_status

    if [ "$?" -eq 0 ]
    then
    echo "FAILURE"
    exit 1
    fi

    echo "SUCCESS"
    exit 0

}

stop() {

    check_status

    if [ "$?" -eq 0 ]
    then
    echo "No master running ..."
    return 1
    else

    echo "Stopping master and workers ..."

    su user -c "$SPARK_SBIN/stop-all.sh" spark &>/dev/null
    sleep 4

    echo "done"

    return 0
    fi
}

status() {

    check_status

    if [ "$?" -eq 0 ]
    then
    echo "No master running"
    exit 1
    else
    echo -n "master running: "
    echo $PID
    exit 0
    fi
}

case "$1" in
    start)
    start
    ;;
    stop)
    stop
    ;;
    restart)
    stop
    start
    ;;
    status)
    status
    ;;
    *)
    echo "Usage: $0 {start|stop|restart|status}"
    exit 1
esac

exit 0

I'm running the service from the master node to start all the cluster nodes.

Some info about my environment:

Upvotes: 3

Views: 1567

Answers (1)

Maximilien Belinga
Maximilien Belinga

Reputation: 3186

I solve it. The issue was coming from my ansible role. I didn't set the group of log folder's owner. Now it works fine.

Upvotes: 3

Related Questions