Reputation: 11
I would like to run dbt using the GKEStartPodOperator
Airflow Operator but I am struggling to find the proper way to authenticate dbt so that it can perform operations on Google Cloud BigQuery.
So here’s my profile in profiles.yaml
file:
my-profile:
target: prod
outputs:
dev:
type: bigquery
method: oauth
project: my-gcp-project
dataset: production
location: EU
threads: 6
job_execution_timeout_seconds: 28800 # 8 hours
priority: "{{ env_var('BQ_PRIORITY', 'interactive') }}"
retries: 3
prod:
type: bigquery
method: oauth
project: my-gcp-project
dataset: production
location: EU
threads: 6
job_execution_timeout_seconds: 28800 # 8 hours
retries: 3
And here’s my GKEStartPodOperator
configuration:
def default_k8s_args(dag, default_args, name="no-name"):
return {
"dag": dag,
"default_args": default_args,
"execution_timeout": timedelta(hours=12),
"project_id": K8S_PROJECT_ID,
"location": K8S_LOCATION,
"cluster_name": K8S_CLUSTER_NAME,
"name": f"dbt-{dag.dag_id}-{name}",
"namespace": GCP_PROJECT,
"is_delete_operator_pod": True,
"container_resources": COMPUTE_RESOURCES,
"startup_timeout_seconds": 600,
"image": f"{DBT_IMAGE_REPO}:{DBT_IMAGE_TAG}",
"image_pull_policy": "Always",
"env_vars": {"SLACK_BOT_TOKEN": SLACK_CONN_ID, "PYTHONUNBUFFERED": "1"},
"gcp_conn_id": GCP_PROJECT,
}
dbt_pre_build_tests = GKEStartPodOperator(
task_id="pre_build_tests",
cmds=[
"/bin/bash",
"-c",
"dbt test --target prod --select assert_pks",
],
**default_k8s_args(dag, default_args, name="pre_build_tests"),
)
The pod launches successfully but then I’m getting the following error:
[2024-12-05, 14:39:08 UTC] {pod_manager.py:356} INFO - e[0m14:39:08 Found 86 models, 16 tests, 3 snapshots, 0 analyses, 469 macros, 2 operations, 0 seed files, 91 sources, 0 exposures, 0 metrics
[2024-12-05, 14:39:08 UTC] {pod_manager.py:356} INFO - e[0m14:39:08
[2024-12-05, 14:39:08 UTC] {pod_manager.py:356} INFO - e[0m14:39:08 Encountered an error:
[2024-12-05, 14:39:08 UTC] {pod_manager.py:356} INFO - Database Error
[2024-12-05, 14:39:08 UTC] {pod_manager.py:356} INFO - [Errno 2] No such file or directory: '/etc/secrets/[GCP_PROJECT]/credentials.json'
[2024-12-05, 14:39:10 UTC] {pod_manager.py:424} ERROR - Error parsing timestamp (no timestamp in message ''). Will continue execution but won't update timestamp
[2024-12-05, 14:39:10 UTC] {pod_manager.py:356} INFO -
[2024-12-05, 14:39:10 UTC] {pod_manager.py:383} WARNING - Pod dbt-data-ext-omnitracking-morning-job-py-pre-build-tests-gbkd7d9v log read interrupted but container base still running
Can someone help me, please?
Thanks in advance.
Upvotes: 1
Views: 37
Reputation: 91769
Running dbt using GKEStartPodOperator within Cloud Composer can be thought of as just running dbt on Kubernetes. For this reason, it's recommended to test a manual deployment of the pod into a cluster, and once that is working, that configuration can be copied into your DAG/operator and work the same as your pod yaml.
The profiles.yaml
you have provided configures an OAuth via gcloud
. However, within a Kubernetes pod, gcloud
may not be installed, and even if it were, it would need to be manually invoked by a human user to write an auth token to disk.
For this reason, you will need to configure a profile to use a service account keyfile. This keyfile must be generated in the same GCP project you intend to use with the BigQuery API, and will need to be granted BigQuery permissions. For details on creating a service account keyfile, see the GCP IAM documentation here.
Upvotes: 0