Reputation: 61
My airflow service runs as a kubernetes deployment, and has two containers, one for the webserver
and one for the scheduler
.
I'm running a task using a KubernetesPodOperator, with in_cluster=True
parameters, and it runs well, I can even kubectl logs pod-name
and all the logs show up.
However, the airflow-webserver
is unable to fetch the logs:
*** Log file does not exist: /tmp/logs/dag_name/task_name/2020-05-19T23:17:33.455051+00:00/1.log
*** Fetching from: http://pod-name-7dffbdf877-6mhrn:8793/log/dag_name/task_name/2020-05-19T23:17:33.455051+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='pod-name-7dffbdf877-6mhrn', port=8793): Max retries exceeded with url: /log/dag_name/task_name/2020-05-19T23:17:33.455051+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fef6e00df10>: Failed to establish a new connection: [Errno 111] Connection refused'))
It seems as the pod is unable to connect to the airflow logging service, on port 8793. If I kubectl exec bash
into the container, I can curl localhost on port 8080, but not on 80 and 8793.
Kubernetes deployment:
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: pod-name
namespace: airflow
spec:
replicas: 1
selector:
matchLabels:
app: pod-name
template:
metadata:
labels:
app: pod-name
spec:
restartPolicy: Always
volumes:
- name: airflow-cfg
configMap:
name: airflow.cfg
- name: dags
emptyDir: {}
containers:
- name: airflow-scheduler
args:
- airflow
- scheduler
image: registry.personal.io:5000/image/path
imagePullPolicy: Always
volumeMounts:
- name: dags
mountPath: /airflow_dags
- name: airflow-cfg
mountPath: /home/airflow/airflow.cfg
subPath: airflow.cfg
env:
- name: EXECUTOR
value: Local
- name: LOAD_EX
value: "n"
- name: FORWARDED_ALLOW_IPS
value: "*"
ports:
- containerPort: 8793
- containerPort: 8080
- name: airflow-webserver
args:
- airflow
- webserver
- --pid
- /tmp/airflow-webserver.pid
image: registry.personal.io:5000/image/path
imagePullPolicy: Always
volumeMounts:
- name: dags
mountPath: /airflow_dags
- name: airflow-cfg
mountPath: /home/airflow/airflow.cfg
subPath: airflow.cfg
ports:
- containerPort: 8793
- containerPort: 8080
env:
- name: EXECUTOR
value: Local
- name: LOAD_EX
value: "n"
- name: FORWARDED_ALLOW_IPS
value: "*"
note: If airflow is run in dev environment (locally instead of kubernetes) it all works perfectly.
Upvotes: 4
Views: 15774
Reputation: 87
Creating a Persistent Volume and storing logs on them might help.
--
kind: PersistentVolume
apiVersion: v1
metadata:
name: testlog-volume
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 2Gi
hostPath:
path: /opt/airflow/logs/
storageClassName: standard
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: testlog-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: standard
if you are using helm chart to deploy airflow, you can use
--set executor=KubernetesExecutor --set logs.persistence.enabled=true --set logs.persistence.existingClaim=testlog-volume
Upvotes: 3
Reputation: 61
The problem was a bug in how the KubernetesPodExecutor from Airflow v1.10.10 tried to launch the pods. Upgrading to Airflow 2.0 solved the issue.
Upvotes: 2
Reputation: 11
I have outlined our solution for kubernetes executor logs here (no need for log volumes elasticsearch or any other complex solutions). https://szeevs.medium.com/handling-airflow-logs-with-kubernetes-executor-25c11ea831e4 Our use case may be a little different, but you can adapt for what you need.
Upvotes: 1
Reputation: 18824
Airflow deletes the pods after task completion, could it be that the pods are just missing so it can't access their logs?
Try set to see if that's the case
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS=False
When running airflow on Kubernetes I suggest using remote logging (e.g. s3) this way the logs are kept when the pods are deleted.
Upvotes: 1