Mr. President
Mr. President

Reputation: 1549

Trying to run apache airflow on ubuntu server with systemd

I'm trying to run airflow on an ubuntu server with systemd. I have followed quick start guide and the tutorial from the airflow documentation and I have managed to install airflow and successfully run it by using the command:

airflow webserver -p 8080

After installing systemd and a lot of trial and error with the configuration files I managed to get airflow running with the command

sudo systemctl start airflow

Airflow kept running for a week until today I restarted it with the command

sudo systemctl restart airflow

Running sudo systemctl status airflow now gives me one of the following two messages:

● airflow.service - Airflow webserver daemon
 Loaded: loaded (/lib/systemd/system/airflow.service; enabled; vendor preset: enabled)
 Active: activating (auto-restart) (Result: exit-code) since Wed 2018-09-12 09:23:01 UTC; 1s ago
Process: 3115 ExecStart=/opt/miniconda3/bin/airflow webserver -p 8080 --pid /home/user/airflow/airflow-webserver.pid --daemon (code=exited, status=1/FAILURE)
Main PID: 3115 (code=exited, status=1/FAILURE)

Sep 12 09:23:01 server-service systemd[1]: airflow.service: Main process exited, code=exited, status=1/FAILURE
Sep 12 09:23:01 server-service systemd[1]: airflow.service: Unit entered failed state.
Sep 12 09:23:01 server-service systemd[1]: airflow.service: Failed with result 'exit-code'.

or

● airflow.service - Airflow webserver daemon
 Loaded: loaded (/lib/systemd/system/airflow.service; enabled; vendor preset: enabled)
 Active: active (running) since Wed 2018-09-12 09:23:54 UTC; 1s ago
Main PID: 3399 (airflow)
  Tasks: 1
 Memory: 56.1M
    CPU: 1.203s
 CGroup: /system.slice/airflow.service
         └─3399 /opt/miniconda3/bin/python /opt/miniconda3/bin/airflow webserver -p 8080 --pid /home/user/airflow/airflow-webserver.pid --daemon

Sep 12 09:23:54 server-service systemd[1]: Stopped Airflow webserver daemon.
Sep 12 09:23:54 server-service systemd[1]: Started Airflow webserver daemon.
Sep 12 09:23:54 server-service airflow[3399]: [2018-09-12 09:23:54,372] {__init__.py:57} INFO - Using executor SequentialExecutor
Sep 12 09:23:55 server-service airflow[3399]:   ____________       _____________
Sep 12 09:23:55 server-service airflow[3399]:  ____    |__( )_________  __/__  /________      __
Sep 12 09:23:55 server-service airflow[3399]: ____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
Sep 12 09:23:55 server-service airflow[3399]: ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
Sep 12 09:23:55 server-service airflow[3399]:  _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
Sep 12 09:23:55 server-service airflow[3399]:  
Sep 12 09:23:55 server-service airflow[3399]: [2018-09-12 09:23:55,124] [3399] {models.py:167} INFO - Filling up the DagBag from /root/airflow/dags

I think the first message is returned when systemd has failed to start airflow and the second message is returned when systemd is still in the process of starting airflow.

Since the first error message contains airflow.service: Service hold-off time over, scheduling restart. I thought I might have the this problem, but running sudo systemctl enable airflow.service doesn't solve the problem (I think airflow.service is enabled anyways as is indicated here: Loaded: loaded (/lib/systemd/system/airflow.service; enabled; vendor preset: enabled)).

In trying to solve the problem I found some weird things that I don't understand:

My airflow.service file looks like this:

[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
EnvironmentFile=/etc/sysconfig/airflow
PIDFile=/home/user/airflow/airflow-webserver.pid
User=%i
Group=%i
Type=simple
ExecStart=/opt/miniconda3/bin/airflow webserver -p 8080 --pid /home/user/airflow/airflow-webserver.pid --daemon

Restart=on-failure
RestartSec=5s
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Question: How do I solve these issues so that I can get airflow running with systemd?

Edit: After restarting the systemd daemon again I've managed to get airflow running (or at least it seems so). Running systemctl status airflow returns:

● airflow.service - Airflow webserver daemon
   Loaded: loaded (/lib/systemd/system/airflow.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2018-09-12 10:49:17 UTC; 6min ago
 Main PID: 30054
    Tasks: 0
   Memory: 388.0K
      CPU: 2.987s
   CGroup: /system.slice/airflow.service

Sep 12 10:49:22 server-service airflow[30031]:   File "/opt/miniconda3/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
Sep 12 10:49:22 server-service airflow[30031]:     reraise(type(exception), exception, tb=exc_tb, cause=cause)
Sep 12 10:49:22 server-service airflow[30031]:   File "/opt/miniconda3/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 186, in reraise
Sep 12 10:49:22 server-service airflow[30031]:     raise value.with_traceback(tb)
Sep 12 10:49:22 server-service airflow[30031]:   File "/opt/miniconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
Sep 12 10:49:22 server-service airflow[30031]:     context)
Sep 12 10:49:22 server-service airflow[30031]:   File "/opt/miniconda3/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
Sep 12 10:49:22 server-service airflow[30031]:     cursor.execute(statement, parameters)
Sep 12 10:49:22 server-service airflow[30031]: sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: connection [SQL: 'SELECT connection.conn_id AS connection_conn_id \nFROM connection G
Sep 12 10:49:23 server-service systemd[1]: airflow.service: Supervising process 30054 which is not our child. We'll most likely not notice when it exits.
lines 1-19/19 (END)

Unfortunately, I can't access airflow in my browser. Moreover, starting airflow with systemd or manually this does not produce the desire files /run/airflow/webserver.pid and /home/user/airflow/airflow-webserver.pid. I've tried to check whether they exist elsewhere with sudo find ~/ -type f -name "webserver.pid" but this doesn't return anything.

I think that the message Supervising process 30054 which is not our child. We'll most likely not notice when it exits. has something to do with my problem, since it did not get this message when airflow was running successfully with systemd in the past. Could it be that systemctl status airflow indicates that airflow has been running for 6 min because systemd doesn't notice that the worker with pid 30054 is no longer active?

Edit 2: I have found out why the airflow-webserver.pid "is not created" by airflow. When you run airflow webserver -p 8080 airflow does create the .pid file, but when you stop the webserver systemd deletes the .pid file again (if airflow does not do so itself). This explains why the airflow-webserver.pid was not there, but it does not explain why the webserver.pid is not in the /run/airflow directory.

Upvotes: 16

Views: 13828

Answers (2)

Merlin
Merlin

Reputation: 81

I know I'm digging up a slightly dated post, but I too was trying to figure out why I could not get the scheduler to run automatically when the server is running.

I did find a solution that works for me on Ubuntu 18.04 and 18.10, so hopefully this helps.

I provided a full write-up of how to install Airflow and PostgreSQL on the backend on the link here.

**from the later part of my article Essentially it comes down to making a specific change to the airflow-scheduler.system file.

This is one of the ‘gotchas’ for an implementation on Ubuntu. The dev team that created Airflow designed it to run on a different distribution of linux and therefore there is a small (but critical) change that needs to be made so that Airflow will automatically run when the server is on. The default systemd service files initially look like this:

[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/airflow scheduler
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target

However, this will not work as the ‘EnvironmentFile’ protocol doesn’t fly on Ubuntu 18. Instead, comment out that line and add in :

Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

You will likely want to create a systemd service file at least for the Airflow Scheduler and also probably the Webserver if you want the UI to launch automatically as well. Indeed we do want both in this implementation, so we will be creating two files, airflow-scheduler.service & airflow-webserver.service. Both of which will be copied to the /etc/systemd/system folder. These are as follows:


airflow-scheduler.service

[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
#EnvironmentFile=/etc/default/airflow
Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
User=airflow
Group=airflow
Type=simple
ExecStart=/home/ubuntu/anaconda3/envs/airflow/bin/airflow scheduler
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target
#airflow-webserver.service

airflow-webserver.service

[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
#EnvironmentFile=/etc/default/airflow
Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
User=airflow
Group=airflow
Type=simple
ExecStart=/home/ubuntu/anaconda3/envs/airflow/bin/airflow webserver -p 8085 --pid /home/ubuntu/airflow/airflow-webserver.pid
Restart=on-failure
RestartSec=5s
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Finally, with both of those files copied to the /etc/systemd/systemd folder by way of a superuser copy command sudo cp it is time to hit the ignition:

sudo systemctl enable airflow-scheduler sudo systemctl start airflow-scheduler sudo systemctl enable airflow-webserver sudo systemctl start airflow-webserver

Upvotes: 8

villasv
villasv

Reputation: 6841

That error sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: connection indicates that your Airflow process is failing to reach a database that has been initialized. Are you sure you ran airflow initdb before trying to setup the Airflow webserver?

I have been running Airflow under systemd in my AWS Airflow Stack, where you can find the configuration parameters. I'll transcribe my config files here for the sake of completeness, but I couldn't find out just by looking why your config won't work.

My configuration is customized to make it work under the user ec2-user inside an Amazon Linux 2 machine, but I believe it should work for Ubuntu as well. Observe that because I'm running the database, redis and everything else on other machines, I removed them from the After section.

        /usr/bin/turbine:
            #!/bin/sh
            exec airflow scheduler

        /etc/sysconfig/airflow:
            AIRFLOW_HOME=/efs/airflow
            AIRFLOW__CELERY__DEFAULT_QUEUE=${queue}
            ... your environment configs
            AWS_DEFAULT_REGION=${AWS::Region}

        /usr/lib/systemd/system/airflow.service:
            [Unit]
            Description=Airflow daemon
            After=network.target
            [Service]
            EnvironmentFile=/etc/sysconfig/airflow
            User=ec2-user
            Group=ec2-user
            Type=simple
            ExecStart=/usr/bin/turbine
            Restart=always
            RestartSec=5s
            [Install]
            WantedBy=multi-user.target

        /usr/lib/tmpfiles.d/airflow.conf:
            D /run/airflow 0755 ec2-user ec2-user

In addition to those, I've set up a watcher service to make sure we're always using the latest environment file with systemd:

        /usr/lib/systemd/system/watcher.service:
            [Unit]
            Description=Airflow configuration watcher
            After=network.target
            [Service]
            Type=oneshot
            ExecStartPre=/usr/bin/systemctl daemon-reload
            ExecStart=/usr/bin/systemctl restart airflow
            [Install]
            WantedBy=multi-user.target

        /usr/lib/systemd/system/watcher.path:
            [Path]
            PathModified=/etc/sysconfig/airflow
            [Install]
            WantedBy=multi-user.target

Everything is set up with

systemctl enable airflow.service
systemctl enable watcher.path
systemctl start airflow.service
systemctl start watcher.path

Upvotes: 0

Related Questions