Reputation: 1306
I have a git repo on my local machine here with some python code c:\repos\myrepo\src\test.py <== the python script I want Airflow to run/execute on a schedule
It is hosted on github.
I have airflow installed and running ("local install") on an EC2 instance. I can access the web page on my local dev machine: http://: and login to the airflow console.
I git cloned the code on the EC2 instance
I now want airflow to invoke a python script (test.py) on a recurring basis (once a day for example, at a specific time)
How do I do this? I am led to a dead end with the current instructions.
Details:
I went to airflow.com and on the Install page: https://airflow.apache.org/docs/apache-airflow/stable/start/index.html
There is a link: [Quick Start]
I clicked there:
I clicked: running airflow locally (installed on EC2 instance, which is not in Docker)
I was able to get to the web page/url
I enabled 'example_bash_operator' & example_python_operator, and clicked inside to look at the '<> Code'
===> Get this:
At this point, I am no closer to understanding what I need to do, to have Airflow execute code in a repo I have on a schedule (test.py).
step by step, what do I need to do to create a new job that will execute my code?
I do not see these sample DAG's calling external code (code in another repo). All the Python code that is to be executed is contained in the example.
There are huge gaping holes in the instructions here to help someone get up and going quickly.
On the Airflow home page: http://:/home
There is no [+] Add DAG (no plus button) to add a DAG. Is this the idea?
Also, I need help with the following: That would be helpful to get started, but ultimately, I need to deploy jobs programatically to the server.
Any and all help to help me get across this canyon would definitely help. I do not know if I am supposed to add Airflow DAG code to my existing repo (wrapping my test.py code with the example DAG code, just lost here
or whether I should create an 'airflow/' repo, put code there, package my code as a library, import etc, and call from there.
Upvotes: 2
Views: 2936
Reputation: 31
Step1. Locate the $AIRFLOW_HOME from your airflow.cfg this should give the path of where your dags folder is. Typically it exists under /airflow/dags/
Step2. Under your /dags directory add a file called test_dag.py with the following code
import time
from repos.myrepo.src.test import test_function # this how I would recommend you run your python code in the dag but its upto you if you want to execute a file
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.python import PythonOperator
from airflow.utils.dates import days_ago
default_args = {
'owner': 'airflow',
}
dag = DAG('dag_test',
default_args=default_args,
description='run_python',
schedule_interval= None,
)
start= DummyOperator(task_id='start', dag=dag,)
run_this = PythonOperator(
task_id='run_pythoncode',
python_callable=test_function,
dag=dag,
)
end= DummyOperator(task_id='end', dag=dag,)
start >> run_this >> end
Step3. After saving your dag, hop onto the webserver using your browser and go under the DAG's tab, this newly created DAG should be under the paused tab, on the left of the dag there would be an enable/disable button. Click to enable the dag. Once it is enabled, click on trigger dag or the play button. [Please note that any errors made in the dag file (test_dag.py) would appear on the top of your DAGs page on Airflow]
Step4. *** Alternative Approach *** to the above mentioned DAG. Let's say you want to run the python file from your git repository instead of importing it in your DAG code.
import time
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.bash import BashOperator
from airflow.utils.dates import days_ago
default_args = {
'owner': 'airflow',
}
dag = DAG('dag_test',
default_args=default_args,
description='run_python',
schedule_interval= None,
)
start= DummyOperator(task_id='start', dag=dag,)
run_this_as_a_file = BashOperator(
task_id='run_python_code_from_repo',
bash_command='python c:\\repos\\myrepo\\src\\test.py',
dag=dag,
)
end= DummyOperator(task_id='end', dag=dag,)
start >> run_this_as_a_file >> end
Upvotes: 1