Reputation: 11
I'm trying to run a task in an environment built from an image in a private Google Container Registry through the KubernetesPodOperator of the Google Cloud Composer. The Container Registry and Cloud Composer instances are under the same project. My code is below.
import datetime
import airflow
from airflow.contrib.operators import kubernetes_pod_operator
YESTERDAY = datetime.datetime.now() - datetime.timedelta(days=1)
# Create Airflow DAG the the pipeline
with airflow.DAG(
'my_dag',
schedule_interval=datetime.timedelta(days=1),
start_date=YESTERDAY) as dag:
my_task = kubernetes_pod_operator.KubernetesPodOperator(
task_id='my_task',
name='my_task',
cmds=['echo 0'],
namespace='default',
image=f'gcr.io/<my_private_repository>/<my_image>:latest')
The task fails and I get the following error message in the logs in the Airflow UI and in the logs
folder in the storage bucket.
[2020-09-21 08:39:12,675] {taskinstance.py:1147} ERROR - Pod Launching failed: Pod returned a failure: failed
Traceback (most recent call last)
File "/usr/local/lib/airflow/airflow/contrib/operators/kubernetes_pod_operator.py", line 260, in execut
'Pod returned a failure: {state}'.format(state=final_state
airflow.exceptions.AirflowException: Pod returned a failure: failed
This not very informative... Any idea what I could be doing wrong? Or anywhere I can find more informative log messages?
Thank you very much!
Upvotes: 0
Views: 1812
Reputation: 31
If you want to pull or push an image in KubernetesPodOperator
from private registry, you should create a Secret in k8s which contains a service account (SA). This SA should have permission for pulling or maybe pushing images (RO/RW permission).
Then just use this secret with SA in KubernetesPodOperator
and specify image_pull_secrets
argument:
my_task = kubernetes_pod_operator.KubernetesPodOperator(
task_id='my_task',
name='my_task',
cmds=['echo 0'],
namespace='default',
image=f'gcr.io/<my_private_repository>/<my_image>:latest',
image_pull_secrets='your_secret_name')
Upvotes: 0
Reputation: 5253
In general, the way how we start troubleshooting GCP Composer once getting a failure running the DAG is finely explained in the dedicated chapter of GCP documentation.
Moving to KubernetesPodOperator specifically related issues, the certain user investigation might consists of:
Further analyzing the error context and KubernetesPodOperator.py
source code, I assume that this issue might occur due to Pod launching problem on Airflow worker GKE node, ending up with Pod returned a failure: {state}'.format(state=final_state)
message once the Pod execution is not successful.
Personally, I prefer to check the image run in prior executing Airflow task in a Kubernetes Pod. Having said this and based on the task command provided, I believe that you can verify the Pod launching process, connecting to GKE cluster and redrafting kubernetes_pod_operator.KubernetesPodOperator
definition being adoptable for kubectl
command-line executor:
kubectl run test-app --image=eu.gcr.io/<Project_ID>/image --command -- "/bin/sh" "-c" "echo 0"
This would simplify the process of image validation, hence you'll be able to get closer look at Pod logs or event records as well:
kubectl describe po test-app
Or
kubectl logs test-app
Upvotes: 1