jorgeavelar98
jorgeavelar98

Reputation: 95

Git-sync sidecar container is not syncing GitHub repo DAGS into Airflow Kubernetes cluster properly

Im attempting to incorporate git-sync sidecar container into my Airflow deployment yaml so my private Github repo gets synced to my Airflow Kubernetes env every time I make a change in the repo.

So far, it successfully creates a git-sync container along with our scheduler, worker, and web server pods, each in their respective pod (ex: scheduler pod contains a scheduler container and gitsync container).  

I looked at the git-sync container logs and it looks like it successfully connects with my private repo (using a personal access token) and prints success logs every time I make a change to my repo.

INFO: detected pid 1, running init handler
I0411 20:50:31.009097      12 main.go:401] "level"=0 "msg"="starting up" "pid"=12 "args"=["/git-sync","-wait=60","-repo=https://github.com/jorgeavelar98/AirflowProject.git","-branch=master","-root=/opt/airflow/dags","-username=jorgeavelar98","-password-file=/etc/git-secret/token"]
I0411 20:50:31.029064      12 main.go:950] "level"=0 "msg"="cloning repo" "origin"="https://github.com/jorgeavelar98/AirflowProject.git" "path"="/opt/airflow/dags"
I0411 20:50:31.031728      12 main.go:956] "level"=0 "msg"="git root exists and is not empty (previous crash?), cleaning up" "path"="/opt/airflow/dags"
I0411 20:50:31.894074      12 main.go:760] "level"=0 "msg"="syncing git" "rev"="HEAD" "hash"="18d3c8e19fb9049b7bfca9cfd8fbadc032507e03"
I0411 20:50:31.907256      12 main.go:800] "level"=0 "msg"="adding worktree" "path"="/opt/airflow/dags/18d3c8e19fb9049b7bfca9cfd8fbadc032507e03" "branch"="origin/master"
I0411 20:50:31.911039      12 main.go:860] "level"=0 "msg"="reset worktree to hash" "path"="/opt/airflow/dags/18d3c8e19fb9049b7bfca9cfd8fbadc032507e03" "hash"="18d3c8e19fb9049b7bfca9cfd8fbadc032507e03"
I0411 20:50:31.911065      12 main.go:865] "level"=0 "msg"="updating submodules"

 

However, despite their being no error logs in my git-sync container logs, I could not find any of the files in the destination directory where my repo is supposed to be synced into (/opt/airflow/dags). Therefore, no DAGs are appearing in the Airflow UI

This is our scheduler containers/volumes yaml definition for reference. We have something similar for workers and webserver

      containers:
        - name: airflow-scheduler
          image: <redacted>
          imagePullPolicy: IfNotPresent
          envFrom:
            - configMapRef:
                name: "AIRFLOW_SERVICE_NAME-env"
          env:            
            <redacted>
          resources: 
            requests:
              memory: RESOURCE_MEMORY
              cpu: RESOURCE_CPU
          volumeMounts:
            - name: scripts
              mountPath: /home/airflow/scripts
            - name: dags-data
              mountPath: /opt/airflow/dags
              subPath: dags
            - name: dags-data
              mountPath: /opt/airflow/plugins
              subPath: plugins
            - name: variables-pools
              mountPath: /home/airflow/variables-pools/
            - name: airflow-log-config
              mountPath: /opt/airflow/config
          command:
            - "/usr/bin/dumb-init"
            - "--"
          args:
            <redacted>
        - name: git-sync
          image: registry.k8s.io/git-sync/git-sync:v3.6.5
          args:
            - "-wait=60"
            - "-repo=<repo>"
            - "-branch=master"
            - "-root=/opt/airflow/dags"
            - "-username=<redacted>"
            - "-password-file=/etc/git-secret/token"
          volumeMounts:
            - name: git-secret
              mountPath: /etc/git-secret
              readOnly: true
            - name: dags-data
              mountPath: /opt/airflow/dags
      volumes:
        - name: scripts
          configMap:
            name: AIRFLOW_SERVICE_NAME-scripts
            defaultMode: 493
        - name: dags-data
          emptyDir: {}
        - name: variables-pools
          configMap:
            name: AIRFLOW_SERVICE_NAME-variables-pools
            defaultMode: 493
        - name: airflow-log-config
          configMap:
            name: airflow-log-configmap
            defaultMode: 493
        - name: git-secret
          secret:
            secretName: github-token

What can be the issue? I couldn't find much documentation that could help me further investigate. Any help and guidance would be greatly appreciated!

Upvotes: 3

Views: 3593

Answers (2)

jorgeavelar98
jorgeavelar98

Reputation: 95

Looks like my issue was that my worker, scheduler, and web server container had different dag volume mounts from the ones I defined for my git-sync container.

This is what I had:

containers:
        - name: airflow-scheduler
          image: <redacted>
          imagePullPolicy: IfNotPresent
          envFrom:
            - configMapRef:
                name: "AIRFLOW_SERVICE_NAME-env"
          env:            
            <redacted>
          resources: 
            requests:
              memory: RESOURCE_MEMORY
              cpu: RESOURCE_CPU
          volumeMounts:
            - name: scripts
              mountPath: /home/airflow/scripts
            - name: dags-data
              mountPath: /opt/airflow/dags
              subPath: dags
            - name: dags-data
              mountPath: /opt/airflow/plugins
              subPath: plugins
            - name: variables-pools
              mountPath: /home/airflow/variables-pools/
            - name: airflow-log-config
              mountPath: /opt/airflow/config

And the following edits made it work. I removed the dag subpath and plugins volume mount:

containers:
        - name: airflow-scheduler
          image: <redacted>
          imagePullPolicy: IfNotPresent
          envFrom:
            - configMapRef:
                name: "AIRFLOW_SERVICE_NAME-env"
          env:            
            <redacted>
          resources: 
            requests:
              memory: RESOURCE_MEMORY
              cpu: RESOURCE_CPU
          volumeMounts:
            - name: scripts
              mountPath: /home/airflow/scripts
            - name: dags-data
              mountPath: /opt/airflow/dags
            - name: variables-pools
              mountPath: /home/airflow/variables-pools/
            - name: airflow-log-config
              mountPath: /opt/airflow/config

Upvotes: 1

jccampanero
jccampanero

Reputation: 53461

Your problem could be probably related to the directory structure you are defining across the different containers.

It is unclear in your question but, according to your containers definitions, your git repository should contain at least dags and plugins as top level directories:

/
├─ dags/
├─ plugins/

This structure resembles a typical airflow folder structure: I assume, that is the one you configured.

Then, please, try using this slightly modified version of your Kubernetes configuration:

      containers:
        - name: airflow-scheduler
          image: <redacted>
          imagePullPolicy: IfNotPresent
          envFrom:
            - configMapRef:
                name: "AIRFLOW_SERVICE_NAME-env"
          env:            
            <redacted>
          resources: 
            requests:
              memory: RESOURCE_MEMORY
              cpu: RESOURCE_CPU
          volumeMounts:
            - name: scripts
              mountPath: /home/airflow/scripts
            - name: dags-data
              mountPath: /opt/airflow/dags
              subPath: dags
            - name: dags-data
              mountPath: /opt/airflow/plugins
              subPath: plugins
            - name: variables-pools
              mountPath: /home/airflow/variables-pools/
            - name: airflow-log-config
              mountPath: /opt/airflow/config
          command:
            - "/usr/bin/dumb-init"
            - "--"
          args:
            <redacted>
        - name: git-sync
          image: registry.k8s.io/git-sync/git-sync:v3.6.5
          args:
            - "-wait=60"
            - "-repo=<repo>"
            - "-branch=master"
            - "-root=/opt/airflow"
            - "-username=<redacted>"
            - "-password-file=/etc/git-secret/token"
          volumeMounts:
            - name: git-secret
              mountPath: /etc/git-secret
              readOnly: true
            - name: dags-data
              mountPath: /opt
      volumes:
        - name: scripts
          configMap:
            name: AIRFLOW_SERVICE_NAME-scripts
            defaultMode: 493
        - name: dags-data
          emptyDir: {}
        - name: variables-pools
          configMap:
            name: AIRFLOW_SERVICE_NAME-variables-pools
            defaultMode: 493
        - name: airflow-log-config
          configMap:
            name: airflow-log-configmap
            defaultMode: 493
        - name: git-secret
          secret:
            secretName: github-token

Note that we basically changed the root argument of the git-sync container removing /dags.

If it doesn't work, please, try including and tweaking the value of the --dest git-sync flag, I think it could be of help as well.

Upvotes: 1

Related Questions