Reputation: 135
I have a private repository. This repository has my common functions about my DAG. (for example: datetime validaters, response encoder function) I want to import this repository's functions on my DAG file and I used this link to do it.
I created pip.conf
file. this file's location is : my-bucket-name/config/pip/pip.conf
and i added my private github repository in this file like this:
[global]
extra-index-url=https://<token>@github.com/my-private-github-repo.git
After this, i wanted to import this repository's functions on my dag file (for example: from common-repo import *) but i got 'module not found' error on my DAG. (and unfortunately in the cloud composer logs, I couldn't see any log showing that the private github repo has been installed.)
I've searched a lot but can't find how to do this.
Upvotes: 3
Views: 1896
Reputation: 591
I have GitHub registered as a connection and I didn't want to replicate the token as a variable.
So I opted for another approach, using f-strings. I understand being more readable, but I don't know if this can make the solution less secure.
PS: GITHUB is the name of connection...
from airflow import DAG
from airflow.decorators import task
from airflow.hooks.base import BaseHook
# Get connection
conn = BaseHook.get_connection('GITHUB')
@task.virtualenv(
task_id="virtualenv_python",
requirements=[
f'git+https://{conn.password}@github.com/my-org/my-private-github-repo.git'
],
system_site_packages=False
)
def callable_from_virtualenv():
import your_private_module
..etc...
virtualenv_task = callable_from_virtualenv()
And as an advantage, it is not necessary to add several variables (one for each private package!).
I also tried jinja (which would avoid the need to include BaseHook) but I didn't succeed...
git+https://{{ conn.GITHUB.password }}@github.com/my-org/my-private-github-repo.git
Upvotes: 0
Reputation: 3955
You can add the private repo to the requirements in a PythonVirtualenvOperator like this:
from airflow import DAG
from airflow.decorators import task
@task.virtualenv(
task_id="virtualenv_python",
requirements=["https://<token>@github.com/my-private-github-repo.git"],
system_site_packages=False
)
def callable_from_virtualenv():
import your_private_module
..etc...
virtualenv_task = callable_from_virtualenv()
(Example ripped from Airflow python operator example)
In order to avoid hardcoding token / credential in the source code, you can use an Airflow variable just like this:
from airflow.models import Variable
@task.virtualenv(
task_id="virtualenv_python",
requirements=[Variable.get("private_github_repo")],
system_site_packages=False
)
Upvotes: 7