MJK
MJK

Reputation: 1401

How to run airflow scheduler as a daemon process?

I am new to Airflow. I am trying to run airflow scheduler as a daemon process, but the process does not live for long. I have configured "LocalExecutor" in airflow.cfg file and ran the following command to start the scheduler.(I am using Google compute engine and accessing server via PuTTY)

airflow scheduler --daemon --num_runs=5 --log-file=/root/airflow/logs/scheduler.log

When I run this command, the airflow scheduler starts and I can see the airflow-scheduler.pid file in my airflow home folder, but the process does not live for long. When I close the PuTTY session and reconnect to the server, I cannot find the scheduler process. Am I missing something? How can I run the airflow scheduler as a daemon process?

Upvotes: 13

Views: 13085

Answers (3)

Floris
Floris

Reputation: 121

I had a similar problem. My airflow scheduler did not keep running as a deamon process when I executed scheduler as deamon:

airflow scheduler -D

But the scheduler did work when I ran it normally. After I deleted the airflow-scheduler.err file and rerun the scheduler as a deamon process it started working:

rm airflow-scheduler.err
airflow scheduler -D

Upvotes: 12

Dmitri Safine
Dmitri Safine

Reputation: 843

You can use systemd or upstart as described here:

https://github.com/apache/incubator-airflow/tree/master/scripts/systemd https://github.com/apache/incubator-airflow/tree/master/scripts/upstart

Here are the instructions just in case if links break in the future.

The provided systemd files are tested on RedHat based systems. Copy (or link) them to /usr/lib/systemd/system and copy the airflow.conf to /etc/tmpfiles.d/ or /usr/lib/tmpfiles.d/. Copying airflow.conf ensures /run/airflow is created with the right owner and permissions (0755 airflow airflow)

You can then start the different servers by using systemctl start . Enabling services can be done by issuing

systemctl enable [service]

By default the environment configuration points to /etc/sysconfig/airflow . You can copy the "airflow" file in this directory and adjust it to your liking. Make sure to specify the SCHEDULER_RUNS variable.

With some minor changes they probably work on other systemd systems.

You can modify provided below configuration files to reflect your environment

Content of /etc/sysconfig/airflow file

# This file is the environment file for Airflow. Put this file in /etc/sysconfig/airflow per default
# configuration of the systemd unit files.
#
# AIRFLOW_CONFIG=
# AIRFLOW_HOME=
#
# required setting, 0 sets it to unlimited. Scheduler will get restart after every X runs
SCHEDULER_RUNS=5

Content of /etc/tmpfiles.d/airflow.conf or /usr/lib/tmpfiles.d/airflow.conf file

D /run/airflow 0755 airflow airflow

Content of /usr/lib/systemd/system/airflow-scheduler.service

[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/airflow scheduler -n ${SCHEDULER_RUNS}
KillMode=process
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target

Upvotes: 5

dieend
dieend

Reputation: 2299

--num-runs=5 will make scheduler run task instances 5 times. You can remove that arguments to make scheduler long running.

Ideally you should run that scheduler under supervisor, so when the process crashed / stopped, it will rerun.

Upvotes: 4

Related Questions