Arrow
Arrow

Reputation: 3

How should I containerize a python script which reads a CSV file?

I am running a simple python script with pandas, which needs to read a CSV file to give an output. I am able to run this manually, however when I try to put the script into a container, it does not run.

I created a Dockerfile first, using gedit Dockerfile inside a folder named Python-test:

FROM python:3

RUN pip install pandas

WORKDIR /mydata

COPY TestCode.py ./

CMD python TestCode.py

Then I built an image using build command and image named python-test docker build -t python-test .

Once built, I created a container and ran it docker run --name pytest -v ${PWD}:/data python-test

However, I am getting following error:

Traceback (most recent call last):
  File "TestCode.py", line 5, in <module>
    df = pd.read_csv(r'/var/lib/docker/volumes/myvol/_data/Book1.csv')
  File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 448, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 880, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 1891, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 374, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File /var/lib/docker/volumes/myvol/_data/Book1.csv does not exist: '/var/lib/docker/volumes/myvol/_data/Book1.csv'

The CSV file I am using is called Book1.csv

Please tell me what I am doing wrong and how should I proceed?

Thank you.

Upvotes: 0

Views: 692

Answers (1)

kimbo
kimbo

Reputation: 2693

When you run your docker container, you're creating a volume with -v ${PWD}:/data. If the file Book1.csv is in your current directory when you run this, on your running docker container it will be accessible at /data/Book1.csv.

This part of the error

File "TestCode.py", line 5, in <module> 
df = pd.read_csv(r'/var/lib/docker/volumes/myvol/_data/Book1.csv')

tells me you need to change line 5 of TestCode.py to something like this:

df = pd.read_csv('/data/Book1.csv')

Edit:

You asked, me to explain a bit more . I'm no genius, so I would recommend reading the official documentation (https://docs.docker.com/engine/reference/builder/), but here's a short explanation of what you asked about.

First, your Dockerfile.

This first line means your base image is python:3. There are lots of images out there that are publicly available that are pre-built for specific use cases (like having the Python dependencies installed). (see https://hub.docker.com/_/python)

FROM python:3

This means run the command pip install pandas

RUN pip install pandas

This means your working directory is /mydata

WORKDIR /mydata

This next line means to copy TestCode.py from your host machine to ./, which in this case is /mydata. So you'll end up with the file /mydata/TestCode.py on your Docker image.

COPY TestCode.py ./

The CMD part defines some defaults for an executable container. See https://docs.docker.com/engine/reference/builder/#cmd for more details.

CMD python TestCode.py

Next, the docker build command. See docs -> https://docs.docker.com/engine/reference/commandline/build/.

docker build -t python-test .

This means grab the Dockerfile in the current directory and use it to build an image, and name the image python-test.

Finally, the docker run command. See docs -> https://docs.docker.com/engine/reference/commandline/run/.

docker run --name pytest -v ${PWD}:/data python-test

This means run a docker container using the python-test image. Name the contianer pytest, and mounts a volume of your current directory into /data in the container. (see https://docs.docker.com/engine/reference/commandline/run/#mount-volume--v---read-only).

Again, the docs cover this much better than I do, so I'd take a look there.

Upvotes: 1

Related Questions