Reputation: 6752
i'm trying to export dag statistics out of Airflow. the statsd output is not very useful, so i decided to basically run a dag to query the SQL and export it out to say influxdb.
so it's easy enough to create a DAG to query the postgres airflow database. however, i'm a little stumped at the schema. i would have thought:
select run_id, start_date, end_date from dag_run where dag_id= 'blah';
would do it, but end end_date
never appears to be populated.
all i'm really after is the total time from which the dag run started (where the first job is initiated as opposed to when the job is first put into a running
state) and the time the dag went into a success
state.
Upvotes: 1
Views: 4461
Reputation: 4366
Try hitting the task_instance
table:
SELECT execution_date
, MIN(start_date) AS start
, MAX(end_date) AS end
, MAX(end_date) - MIN(start_date) AS duration
FROM task_instance
WHERE dag_id = 'blah'
AND state = 'success'
GROUP BY execution_date
ORDER BY execution_date DESC
Upvotes: 11