yee379
yee379

Reputation: 6752

get statistics of dag run times

i'm trying to export dag statistics out of Airflow. the statsd output is not very useful, so i decided to basically run a dag to query the SQL and export it out to say influxdb.

so it's easy enough to create a DAG to query the postgres airflow database. however, i'm a little stumped at the schema. i would have thought:

select run_id, start_date, end_date from dag_run where dag_id= 'blah';

would do it, but end end_date never appears to be populated.

all i'm really after is the total time from which the dag run started (where the first job is initiated as opposed to when the job is first put into a running state) and the time the dag went into a success state.

Upvotes: 1

Views: 4461

Answers (1)

joebeeson
joebeeson

Reputation: 4366

Try hitting the task_instance table:

  SELECT  execution_date
       ,  MIN(start_date) AS start
       ,  MAX(end_date) AS end
       ,  MAX(end_date) - MIN(start_date) AS duration
    FROM  task_instance
   WHERE  dag_id = 'blah'
     AND  state = 'success'
GROUP BY  execution_date
ORDER BY  execution_date DESC

Upvotes: 11

Related Questions