Reputation: 193
I am currently doing a project for my uni where I set up an ML-workflow in Airflow, containerized with docker and started via a docker compose file. The starting point should be a .csv file, which I have available locally (or on github). I am not yet understanding how I can bring this .csv to airflow (I quickly realized that pd.read_csv does not work as I am obviously in the container when the function is executed). What are my options and which one is best?
(Most tutorials I have found import their dfs directly from sklearn or kaggle but that is not an option for me.)
Thanks already!
Upvotes: 2
Views: 2493
Reputation: 978
You have to add the csv file to the volume you are uploading to Airflow.
If for example your dags are uploaded that way:
volumes:
- ./dags:/usr/local/airflow/dags
- ./logs-volume:/usr/local/airflow/logs
you can put your file directly under dags/file.csv
To check that your file has been loaded to your container, you have to run it and run the following:
docker ps
which will result in:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8bffd2dad332 airflow:latest "/entrypoint.sh webs…" About an hour ago Up 6 seconds 5555/tcp, 8793/tcp, 0.0.0.0:8080->8080/tcp webserver_1
f65bf73811cb postgres:9.6 "docker-entrypoint.s…" 4 hours ago Up 7 seconds 0.0.0.0:53468->5432/tcp postgres_1
and then you can run:
docker exec -it 8bffd2dad332 /bin/bash
then you can just do an ls on your file:
ls dags/
Upvotes: 2