Paul1911
Paul1911

Reputation: 193

Read local .csv to containerized airflow

I am currently doing a project for my uni where I set up an ML-workflow in Airflow, containerized with docker and started via a docker compose file. The starting point should be a .csv file, which I have available locally (or on github). I am not yet understanding how I can bring this .csv to airflow (I quickly realized that pd.read_csv does not work as I am obviously in the container when the function is executed). What are my options and which one is best?

(Most tutorials I have found import their dfs directly from sklearn or kaggle but that is not an option for me.)

Thanks already!

Upvotes: 2

Views: 2493

Answers (1)

Driss NEJJAR
Driss NEJJAR

Reputation: 978

You have to add the csv file to the volume you are uploading to Airflow.

If for example your dags are uploaded that way:

    volumes:
        - ./dags:/usr/local/airflow/dags
        - ./logs-volume:/usr/local/airflow/logs

you can put your file directly under dags/file.csv

To check that your file has been loaded to your container, you have to run it and run the following:

docker ps

which will result in:

CONTAINER ID   IMAGE               COMMAND                  CREATED             STATUS         PORTS                                        NAMES
8bffd2dad332   airflow:latest   "/entrypoint.sh webs…"   About an hour ago   Up 6 seconds   5555/tcp, 8793/tcp, 0.0.0.0:8080->8080/tcp   webserver_1
f65bf73811cb   postgres:9.6        "docker-entrypoint.s…"   4 hours ago         Up 7 seconds   0.0.0.0:53468->5432/tcp                      postgres_1

and then you can run:

docker exec -it 8bffd2dad332 /bin/bash

then you can just do an ls on your file:

ls dags/

Upvotes: 2

Related Questions