Pranav Rustagi
Pranav Rustagi

Reputation: 2721

Airflow: Failing to push data back into the pipeline

I have setup a workflow, which consists of two tasks:

In second, I am successfully pulling data pushed into the pipeline by the first task. However, after processing data, when I try to push data back into the pipeline, I get the error "INFO - Task exited with return code -9".

Here are the logs: enter image description here

Why is XCOM failing to push data into pipeline? How can I communicate data to other tasks?

Upvotes: 1

Views: 103

Answers (2)

Pranav Rustagi
Pranav Rustagi

Reputation: 2721

Identified issue:

The problem was with the cross communication i.e. XCOM, which is not suitable for passing dataframes, and large amounts of data in the pipelines. Reference: https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/xcoms.html


Solution:

Since we can not pass the data between tasks using XCOMs, we can use shared storage using which tasks can communicate data. The XCOMs can be used to pass the location of the files in shared storage so that other tasks can know from where to fetch data from.

Upvotes: 0

Akshay
Akshay

Reputation: 38

Return code -9 is mostly associated with an out-of-memory error. For instance, how big is your data size compared to the worker memory executing the task "primary_transform_task"? Get rid of unused variables and optimize for memory, or get a bigger worker node to accommodate the big data. Also, remember that the best use of airflow is to orchestrate. Data processing for large datasets needs to happen on Spark clusters, for example.

Upvotes: 0

Related Questions