Imad
Imad

Reputation: 2741

Unable to pinpoint issue with GCP Composer (Airflow) DAG task failure

I am new at using Apache Airflow. Some operators of my dag have a failed status. I am trying to understand the origin of the error.

Here are the details of the problem: My dag is pretty big, and certain parts of it are composed of sub-dags. What I notice in the Composer UI, is that the Subdags that failed, all did in a task_id named download_file that uses XCom with a GoogleCloudStorageDownloadOperator.


>> GoogleCloudStorageDownloadOperator(
    task_id='download_file',
    bucket="sftp_sef",
    object="{{task_instance.xcom_pull(task_ids='find_file') | first }}",
    filename="/home/airflow/gcs/data/zips/{{{{ds_nodash}}}}_{0}.zip".format(table)
)

The logs in the said Subdag do not show anything useful.

LOG :

[2020-04-07 15:19:25,618] {models.py:1359} INFO - Dependencies all met for [2020-04-07 15:19:25,660] {models.py:1359} INFO - Dependencies all met for [2020-04-07 15:19:25,660] {models.py:1577} INFO -

------------------------------------------------------------------------------- Starting attempt 10 of 1

[2020-04-07 15:19:25,685] {models.py:1599} INFO - Executing on 2020-04-06T11:44:31+00:00 [2020-04-07 15:19:25,685] {base_task_runner.py:118} INFO - Running: ['bash', '-c', 'airflow run datamart_integration.consentement_email download_file 2020-04-06T11:44:31+00:00 --job_id 156313 --pool integration --raw -sd DAGS_FOLDER/datamart/datamart_integration.py --cfg_path /tmp/tmpacazgnve']

I am not sure if there is somewhere I am not checking... Here are my questions :

  1. How do I debug errors in my Composer DAGs in general
  2. Is it a good idea to create a local airflow environment to run & debug my dags locally?
  3. How do I verify if there are errors in XCom?

Upvotes: 1

Views: 1596

Answers (1)

Alexandre Moraes
Alexandre Moraes

Reputation: 4032

Regarding your three questions:

First, when using Cloud Composer you have several ways of debugging error in your code. According to the documentation, you should:

  1. Check the Airflow logs.

These logs are related to single DAG tasks. It is possible to view them in the Cloud Storage's logs folder and in the Web Airflow interface.

When you create a Cloud Composer environment a Cloud Storage Bucket is also created and associate with it. Thus, Cloud Composer stores the logs for single DAG tasks in the logs folder inside this bucket, each workflow folder has a folder for its DAGs and sub-DAGs. You can check its structure here.

Regarding the Airflow web interface, it is refreshed every 60 seconds.Also, you can check more about it here.

  1. Review the Google Cloud's operations suite.

You can use Cloud Monitoring and Cloud Logging with Cloud Composer. Whereas Cloud Monitoring provides visibility into the performance and overall health of cloud-powered applications, Cloud Logging shows the logs that the scheduler and worker containers produce. Therefore, you can use both or just the one you find more useful based on your need.

  1. In the Cloud Console, check for errors on the pages for the Google Cloud components running your environment.

  2. In the Airflow web interface, check in the DAG's Graph View for failed task instances.

Thus, these are the steps recommended when troubleshooting your DAG.

Second, regarding testing and debugging, it is recommended that you separate production and test environment to avoid DAG interference.

Furthermore, it is possible to test your DAG locally, there is a tutorial in the documentation about this topic, here. Testing locally allows you to identify syntax and task errors. However, I must point that it won't be possible to check/evaluate dependencies and communication to the database.

Third, in general, in order to verify errors in Xcom you should check:

  • If there is any error code/number;
  • Check with a sample code from the documentation if your syntax is correct;
  • Check if the packages if they are deprecated;

I would like to point that, according to this documentation, the path to GoogleCloudStorageDownloadOperator was updated to GCSToLocalOperator.

In addition, I also encourage you to have a look at this: code and documentation to check Xcom syntax and errors.

Feel free to share the error code with me if you need further help.

Upvotes: 2

Related Questions