Abhijit
Abhijit

Reputation: 17

Airflow webserver pod unable to get log from worker pod

I have deployed Airflow [2.9.3] in AKS. But when executing the DAGS getting below error. Don't getting any clue what needs to be updated. I am using Helm [1.15.0] for deployment. I am totally clueless please help..

Error

Could not read served logs: HTTPConnectionPool(host='file-sensor-waiting-for-trigger-file-in-nprd-vgcdjyz8', port=8793): Max retries exceeded with url: /log/dag_id=file_sensor/run_id=manual__2024-11-06T03:10:17.703058+00:00/task_id=waiting_for_trigger_file_in_nprd/attempt=4.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8f23131760>: Failed to establish a new connection: [Errno -2] Name or service not known'))

Updated after doing some updates as below

I have created one "Connection" for "Azure Blob Storage" from Webportal with the name "adlsgen2". The connection was successfully tested.

Azure blob storage connection adlsgen2

Then I updated "logging:" in values.yaml with below code

config:
  core:
    dags_folder: '{{ include "airflow_dags" . }}'
    # This is ignored when used with the official Docker image
    load_examples: 'False'
    executor: '{{ .Values.executor }}'
    # For Airflow 1.10, backward compatibility; moved to [logging] in 2.0
    colored_console_log: 'False'
    remote_logging: '{{- ternary "True" "False" .Values.elasticsearch.enabled }}'
  logging:
    # remote_logging: '{{- ternary "True" "False" .Values.elasticsearch.enabled }}'
    remote_logging: 'True'
    remote_log_con_id: 'adlsgen2'
    remote_base_log_folder: 'wasb://[email protected]/logs'
    delete_local_logs: 'True'
    colored_console_log: 'False'

Then I executed below commands to roll out the DNS.

roll out the dns

Then I executed below commands to re deploy the airflow with helm. And it was successfully deployed.

helm upgrade airflow apache-airflow/airflow --namespace dfs-test -f values.yaml pod details after installation

But I am getting below warning when looking into web server

startup prob failed

When I am trying to access the web server from browser I am able to login and found that log massage is gone. But no log is showing. Also no log is written in Blob storage, where I made my webconnection for logging. .

wbserver not showing any logs

Upvotes: 0

Views: 382

Answers (1)

Arko
Arko

Reputation: 3781

This error typically arises from misconfigured DNS within the cluster

deploy one airflow namespace to isolate the environment.

kubectl create namespace airflow
kubectl config set-context --current --namespace=airflow

add the repo for apache

helm repo add apache-airflow https://airflow.apache.org
helm repo update

Create a config file for exampleairflow-values.yaml to specify the configuration of Airflow components, including the executor type, ingress, and logging settings

executor: "CeleryExecutor"
airflowVersion: "2.9.3"

web:
  service:
    type: LoadBalancer
  defaultUser:
    enabled: true
    username: admin
    password: admin
  secretKeyRef: airflow-webserver-secret-key  

workers:
  replicas: 2

ingress:
  web:
    enabled: true
    ingressClassName: nginx  
    hosts:
      - name: "airflow.example.com"  
  flower:
    enabled: false

logging:
  remote_logging: true
  remote_log_conn_id: "logfile"
  worker_log_server_port: 8793
  remote_log_fetch_timeout: 60  
  remote_log_fetch_retries: 5   

enter image description here

Deploy Airflow using Helm with this customized configuration

helm install airflow apache-airflow/airflow --namespace airflow -f airflow-values.yaml

enter image description here

After deployment, verify the same

enter image description here

Sometimes, Core DNS may not correctly resolve DNS. Restart CoreDNS to ensure it's functioning properly.

kubectl rollout restart deployment coredns -n kube-system
kubectl get pods -n kube-system -l k8s-app=kube-dns

enter image description here

To validate that DNS resolution is working correctly, do a nslookup within the Airflow pod

kubectl run -i --tty dnsutils --image=tutum/dnsutils --rm=true -- bash
nslookup airflow-worker.airflow.svc.cluster.local

enter image description here

As you can see the DNS resolution inside the cluster is working as expected. The airflow-worker.airflow.svc.cluster.local domain is correctly resolving to multiple IP addresses.

Update- In addition to above troubleshooting, kindly configure permissions correctly for the storage account and AKS.

create an aks cluster

az aks create \
    --resource-group arkorg \
    --name arkoAKSCluster \
    --node-count 2 \
    --enable-managed-identity \
    --generate-ssh-keys \
    --location eastus

enter image description here

Create storage A/C

az storage account create \
    --name arkoairflowstorage \
    --resource-group arkorg \
    --location eastus \
    --sku Standard_LRS

Create blob storage container for logs

az storage container create \
    --account-name arkoairflowstorage \
    --name airflowlogs

enter image description here

Retrieve storage account key

az storage account keys list \
    --resource-group arkorg \
    --account-name arkoairflowstorage \
    --query "[0].value" -o tsv

enter image description here

Here is the updated values file

executor: "CeleryExecutor"
airflowVersion: "2.9.3"

web:
  service:
    type: LoadBalancer
  defaultUser:
    enabled: true
    username: admin
    password: admin
  env:
    - name: AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER
      value: "wasb://[email protected]/logs"
    - name: AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID
      value: "azure_blob_storage"

workers:
  replicas: 2
  env:
    - name: AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER
      value: "wasb://[email protected]/logs"
    - name: AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID
      value: "azure_blob_storage"

scheduler:
  env:
    - name: AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER
      value: "wasb://[email protected]/logs"
    - name: AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID
      value: "azure_blob_storage"

logging:
  remote_logging: true
  remote_log_conn_id: "azure_blob_storage"
  worker_log_server_port: 8793
  remote_log_fetch_timeout: 60
  remote_log_fetch_retries: 5

connections:
  - id: "azure_blob_storage"
    type: "wasb"
    extra: |
      {
        "connection_string": "DefaultEndpointsProtocol=https;AccountName=arkoairflowstorage;AccountKey=abcdefghijklmnopthisisasamplefakestringuseyourownoriginaloneyougetfromabovecommand==;EndpointSuffix=core.windows.net"
      }

Few modifications I made such as the CeleryExecutor should be specified as executor in the main values.yaml , outside the config section. Plus, I have added a web section which defines the service type, creds, and environment variables for remote logging.

Add the Airflow Helm Repository

helm repo add apache-airflow https://airflow.apache.org
helm repo update

Deploy Airflow to AKS

helm install airflow apache-airflow/airflow \
    --namespace airflow \
    --create-namespace \
    -f airflow-values.yaml

enter image description here

Verify Deployment

kubectl get pods -n airflow

enter image description here

from Airflow UI, confirm the connection under Admin > Connections and confirm Logging in Blob Storage by running a sample DAG to generate logs.

enter image description here

Upvotes: 2

Related Questions