Yao Yan
Yao Yan

Reputation: 73

how to read csv file using pandas in a docker container

I tried to dockerize my machine learning model written in python. The python script includes using pandas to load csv files. When I ran the image in a container, the pd.read_csv("FILENAME.csv")command can't retrieve the csv file,(I think the problem might be that the csv file is not in the container). Any suggestions on what should I do to run this python script and read the csv files on docker.

dockerfile:

FROM python:latest
RUN pip install pandas
RUN pip install numpy
RUN pip install sklearn 
COPY . /app
ENTRYPOINT ["python", "app/model1.py","death_clean.csv","condition_data_clean.csv"]

model1.py

import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

df1=pd.read_csv("/Users/yaoyan/Desktop/docker-trial/condition_data_clean.csv",error_bad_lines=False)
df2=pd.read_csv("/Users/yaoyan/Desktop/docker-trial/death_clean.csv",error_bad_lines=False)

df=pd.merge(df1,df2,on=['person_id'], how='left')

when I ran it, I got this error:

FileNotFoundError: File b'/Users/yaoyan/Desktop/docker-trial/condition_data_clean.csv' does not exist

Upvotes: 4

Views: 6313

Answers (1)

Szymon Maszke
Szymon Maszke

Reputation: 24701

You should create a volume containing your data using docker volume command. After this step you need to mount this storage using -v option in docker run, e.g. -v my_data_volum:/data. Lastly, change your path appropriately in Python script, in this case it would be /data/my_csv.csv. More informations in documentation.

Or if you insist on copying the file, use path /app/condition_data_clean.csv in your pandas' read_csv function.

Upvotes: 4

Related Questions