Reputation: 109
I'm trying to run Airflow 2 locally with a postgres db (localhost). I can get the webserver running, however I can't get the scheduler to run at the same time as the webserver. Running airflow scheduler
:
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2022-08-27 16:10:50,543] {scheduler_job.py:709} INFO - Starting the scheduler
[2022-08-27 16:10:50,544] {scheduler_job.py:714} INFO - Processing each file at most -1 times
[2022-08-27 16:10:50 -0500] [48113] [INFO] Starting gunicorn 20.1.0
[2022-08-27 16:10:50,546] {executor_loader.py:105} INFO - Loaded executor: SequentialExecutor
[2022-08-27 16:10:50 -0500] [48113] [INFO] Listening at: http://[::]:8793 (48113)
[2022-08-27 16:10:50 -0500] [48113] [INFO] Using worker: sync
[2022-08-27 16:10:50,550] {manager.py:160} INFO - Launched DagFileProcessorManager with pid: 48114
[2022-08-27 16:10:50,552] {scheduler_job.py:1231} INFO - Resetting orphaned tasks for active dag runs
[2022-08-27 16:10:50 -0500] [48115] [INFO] Booting worker with pid: 48115
[2022-08-27 16:10:50,556] {settings.py:55} INFO - Configured default timezone Timezone('UTC')
[2022-08-27T16:10:50.567-0500] {manager.py:406} WARNING - Because we cannot use more than 1 thread (parsing_processes = 2) when using sqlite. So we set parallelism to 1.
[2022-08-27 16:10:50 -0500] [48116] [INFO] Booting worker with pid: 48116
[2022-08-27 16:15:50,663] {scheduler_job.py:1231} INFO - Resetting orphaned tasks for active dag runs
[2022-08-27 16:20:50,749] {scheduler_job.py:1231} INFO - Resetting orphaned tasks for active dag runs
[2022-08-27 16:25:50,834] {scheduler_job.py:1231} INFO - Resetting orphaned tasks for active dag runs
[2022-08-27 16:30:50,911] {scheduler_job.py:1231} INFO - Resetting orphaned tasks for active dag runs
[2022-08-27 16:35:50,991] {scheduler_job.py:1231} INFO - Resetting orphaned tasks for active dag runs
[2022-08-27 16:40:51,064] {scheduler_job.py:1231} INFO - Resetting orphaned tasks for active dag runs
I can run the db, scheduler, and webserver using airflow standalone
, however my understanding is this practice is really just for development, not for production so I want to avoid this. When initializing the database I have no issues. However when I go to the webserver UI, it will signal that no scheduler is running. Then I am required to kill the UI to run airflow scheduler
from the CLI. Now per the code above, there isn't a point where control is returned back to my terminal from the scheduler without killing the scheduler, meaning I can't get back to the webserver UI. How can I then run the scheduler and run the webserver simultaneously without killing either process for the other?
Upvotes: 0
Views: 2372
Reputation: 5096
Airflow has multiple core components, like wbeserver
and scheduler
, these components run in separate processes, when you run airflow standalone
, Airflow runs the webserver
, the scheduler
and the triggerer
(a process which supports deferrable operators) in 3 processes (check the source code).
If you want to run them manually, you should run each service in a separate terminal or run them in background:
airflow scheduler &
airflow webserver &
Upvotes: 1