Neeraj
Neeraj

Reputation: 1817

How can I process data in Google storage via apache airflow?

I have a CSV file in google cloud storage. I'm using google cloud composer to run apache airflow. I would like to run some bash scripts on my CSV file and store it back to google cloud storage? I tried out searching various operators, but couldn't find any operator that process files in google storage. Is there some way to get this done?

Thanks in Advance.

Upvotes: 1

Views: 3294

Answers (1)

medvedev1088
medvedev1088

Reputation: 3745

Here is an example:

bash_operator.BashOperator(
    task_id="process_csv",
    bash_command="gsutil cp gs://your_bucket/your_file.csv your_file.csv && "
                 "process_file your_file.csv > processed_file.csv && "
                 "gsutil cp processed_file.csv gs://your_bucket/processed_file.csv",
    execution_timeout=timedelta(hours=1),
    dag=dag
)

You can find more examples in this repository https://github.com/blockchain-etl/bitcoin-etl-airflow/blob/develop/dags/bitcoinetl/build_export_dag.py.

You can also use PythonOperator instead of BashOperator. Some examples can be found here https://github.com/blockchain-etl/ethereum-etl-airflow/blob/master/dags/export_dag.py

Upvotes: 3

Related Questions