Abhishek Ranjan
Abhishek Ranjan

Reputation: 931

Airflow Operator to pull data from external Rest API

I am trying to pull data from an external API and dump it on S3 . I was thinking on writing and Airflow Operator rest-to-s3.py which would pull in data from external Rest API .

My concerns are :

  1. This would be a long running task , how do i keep track of failures ?
  2. Is there a better alternative than writing an operator ?
  3. Is it advisable to do a task that would probably run for a couple of hours and wait on it ?

I am fairly new to Airflow so it would be helpful.

Upvotes: 0

Views: 3063

Answers (1)

LiorH
LiorH

Reputation: 18824

  1. Errors - one of the benefits of using a tool like airflow is error tracking. Any failed task is subject to rerun (based on configuration) will persist its state in task history etc.. Also, you can branch based on the task status to decide if you want to report error e.g. to email
  2. An operator sounds like a valid option, another option is the built-in PythonOperator and writing a python function.
  3. Long-running tasks are problematic with any design and tool. You better break it down to small tasks (and maybe parallelize their execution to reduce the run time?) Does the API take long time to respond? Or do you send many calls? maybe split based on the resulting s3 files? i.e. each file is a different DAG/branch?

Upvotes: 5

Related Questions