duyguevrim
duyguevrim

Reputation: 135

GCP Apache Airflow - How to install Python package from a private repository and import on DAG?

I have a private repository. This repository has my common functions about my DAG. (for example: datetime validaters, response encoder function) I want to import this repository's functions on my DAG file and I used this link to do it.

I created pip.conf file. this file's location is : my-bucket-name/config/pip/pip.conf and i added my private github repository in this file like this:

[global]
extra-index-url=https://<token>@github.com/my-private-github-repo.git

After this, i wanted to import this repository's functions on my dag file (for example: from common-repo import *) but i got 'module not found' error on my DAG. (and unfortunately in the cloud composer logs, I couldn't see any log showing that the private github repo has been installed.)

I've searched a lot but can't find how to do this.

Upvotes: 3

Views: 1896

Answers (2)

Michel Metran
Michel Metran

Reputation: 591

I have GitHub registered as a connection and I didn't want to replicate the token as a variable.

So I opted for another approach, using f-strings. I understand being more readable, but I don't know if this can make the solution less secure.

PS: GITHUB is the name of connection...

from airflow import DAG
from airflow.decorators import task
from airflow.hooks.base import BaseHook

# Get connection
conn = BaseHook.get_connection('GITHUB')

@task.virtualenv(
   task_id="virtualenv_python",
   requirements=[
          f'git+https://{conn.password}@github.com/my-org/my-private-github-repo.git'
   ],
   system_site_packages=False
)

def callable_from_virtualenv():
   import your_private_module

   ..etc...


virtualenv_task = callable_from_virtualenv()

And as an advantage, it is not necessary to add several variables (one for each private package!).

I also tried jinja (which would avoid the need to include BaseHook) but I didn't succeed...

git+https://{{ conn.GITHUB.password }}@github.com/my-org/my-private-github-repo.git

Upvotes: 0

I&#241;igo Gonz&#225;lez
I&#241;igo Gonz&#225;lez

Reputation: 3955

You can add the private repo to the requirements in a PythonVirtualenvOperator like this:

from airflow import DAG
from airflow.decorators import task

@task.virtualenv(
   task_id="virtualenv_python",
   requirements=["https://<token>@github.com/my-private-github-repo.git"],
                 system_site_packages=False
)

def callable_from_virtualenv():
   import your_private_module

   ..etc...


virtualenv_task = callable_from_virtualenv()

(Example ripped from Airflow python operator example)

In order to avoid hardcoding token / credential in the source code, you can use an Airflow variable just like this:

from airflow.models import Variable

@task.virtualenv(
   task_id="virtualenv_python",
   requirements=[Variable.get("private_github_repo")],
                 system_site_packages=False
)

Upvotes: 7

Related Questions