Reputation: 1409
I have a dbt project running on Cloud Composer and all my models and snapshots are running sucessfully.
I'm having trouble generating the documentation once all the processing is finished.
The integration between dbt and cloud composer is done via airflow-dbt and I have setup a task for the DbtDocsGenerateOperator.
The DAG actually runs fine, and I can see in the log that the catalog.json
file is being written to the target folder in the correspondent cloud bucket, but the file is not there.
Doing some investigation on the GCP logging, I've notice that there's a process called gcs-syncd
that is apparently removing the file.
Wondering if anyone has had success in this integration before and was able to generate the dbt docs from cloud composer?
{
insertId: "**********"
labels: {2}
logName: "************/logs/gcs-syncd"
receiveTimestamp: "****-**-****:**:33.621914158*"
resource: {2}
severity: "INFO"
textPayload: "Removing file:///home/airflow/gcs/dags/target/catalog.json"
timestamp: "****-**-****:**:28.220171689Z"
}
Followed by this error message:
{
insertId: "rdvl8sfx903ai0y8"
labels: {
compute.googleapis.com/resource_name: "***************"
k8s-pod/config_id: "************************"
k8s-pod/pod-template-hash: "*************"
k8s-pod/run: "airflow-worker"
}
logName: "************/logs/stderr"
receiveTimestamp: "****-**-****:**:28.921706522Z"
resource: {
labels: {6}
type: "k8s_container"
}
severity: "ERROR"
textPayload: "Removing file:///home/airflow/gcs/dags/target/catalog.json"
timestamp: "****-**-****:**:28.220171689Z"
}
The airflow log doesn't show me any errors at all, and the process succeeds with the message:
[2021-11-14 21:08:10,601] {dbt_hook.py:130} INFO - 21:08:10 |
[2021-11-14 21:08:10,679] {dbt_hook.py:130} INFO - 21:08:10 | Done.
[2021-11-14 21:08:10,682] {dbt_hook.py:130} INFO - 21:08:10 | Building catalog
[2021-11-14 21:08:43,054] {dbt_hook.py:130} INFO - 21:08:43 | Catalog written to /home/airflow/gcs/dags/target/catalog.json
[2021-11-14 21:08:43,578] {dbt_hook.py:132} INFO - Command exited with return code 0
[2021-11-14 21:08:47,822] {taskinstance.py:1213} INFO - Marking task as SUCCESS.
Upvotes: 1
Views: 924
Reputation: 34
The problem here is that you're writing your catalog file to a location on a worker node that is mounted to the dags folder in gcs, which airflow and cloud composer manages. Per the documentation,
When you modify DAGs or plugins in the Cloud Storage bucket, Cloud Composer synchronizes the data across all the nodes in the cluster.
Cloud Composer synchronizes the dags/ and plugins/ folders uni-directionally by copying locally. Unidirectional synching means that local changes in these folders are overwritten.
The data/ and logs/ folders synchronize bi-directionally by using Cloud >Storage FUSE.
If you change the location of this file to /home/airflow/gcs/data/target/catalog.json, you should be fine as that syncs bi-directionally
Upvotes: 0