ccasimiro9444
ccasimiro9444

Reputation: 415

Install pandas in a Dockerfile

I am trying to create a Docker image. The Dockerfile is the following:

# Use the official Python 3.6.5 image
FROM python:3.6.5-alpine3.7

# Set the working directory to /app
WORKDIR /app

# Get the 
COPY requirements.txt /app
RUN pip3 install --no-cache-dir -r requirements.txt

# Configuring access to Jupyter
RUN mkdir /notebooks
RUN jupyter notebook --no-browser --ip 0.0.0.0 --port 8888 /notebooks

The requirements.txt file is:

jupyter
numpy==1.14.3
pandas==0.23.0rc2
scipy==1.0.1
scikit-learn==0.19.1
pillow==5.1.1
matplotlib==2.2.2
seaborn==0.8.1

Running the command docker build -t standard . gives me an error when docker it trying to install pandas. The error is the following:

Collecting pandas==0.23.0rc2 (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/46/5c/a883712dad8484ef907a2f42992b122acf2bcecbb5c2aa751d1033908502/pandas-0.23.0rc2.tar.gz (12.5MB)
    Complete output from command python setup.py egg_info:
    /bin/sh: svnversion: not found
    /bin/sh: svnversion: not found
    non-existing path in 'numpy/distutils': 'site.cfg'
    Could not locate executable gfortran
    ... (loads of other stuff)
    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-xb6f6a5o/pandas/
The command '/bin/sh -c pip3 install --no-cache-dir -r requirements.txt' returned a non-zero code: 1

When I try to install a lower version of pandas==0.22.0, I get this error:

Step 5/7 : RUN pip3 install --no-cache-dir -r requirements.txt
 ---> Running in 5810ea896689
Collecting jupyter (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/83/df/0f5dd132200728a86190397e1ea87cd76244e42d39ec5e88efd25b2abd7e/jupyter-1.0.0-py2.py3-none-any.whl
Collecting numpy==1.14.3 (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/b0/2b/497c2bb7c660b2606d4a96e2035e92554429e139c6c71cdff67af66b58d2/numpy-1.14.3.zip (4.9MB)
Collecting pandas==0.22.0 (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/08/01/803834bc8a4e708aedebb133095a88a4dad9f45bbaf5ad777d2bea543c7e/pandas-0.22.0.tar.gz (11.3MB)
  Could not find a version that satisfies the requirement Cython (from versions: )
No matching distribution found for Cython
The command '/bin/sh -c pip3 install --no-cache-dir -r requirements.txt' returned a non-zero code: 1

I also tried to install Cyphon and setuptools before pandas, but it gave the same No matching distribution found for Cython error at the pip3 install pandas line.

How could I get pandas installed.

Upvotes: 15

Views: 33073

Answers (7)

Ariel Szmerla
Ariel Szmerla

Reputation: 132

for python 3.11 I used succesfully this Dockerfile:

FROM python:3.11.1-alpine3.17
WORKDIR /your_path_to_dockerfile/

RUN apk add g++ postgresql-dev cargo gcc python3-dev libffi-dev musl-dev zlib-dev jpeg-dev

COPY requirements.txt ./
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

with following requirements.txt: pandas

Upvotes: 0

RicHincapie
RicHincapie

Reputation: 3973

Next I share you our Dockerfile which builds Ok with Python 3.9 and Alpine 3.13.

This is meant to work with a Postgresql 12 via SQLAlchemy.

It was very helpful the Kevin Smith post here, but with some add ons.

FROM python:3.9-alpine3.13

ENV MAIN_DIR=/home/my_dir

RUN mkdir "${MAIN_DIR}"

WORKDIR "${MAIN_DIR}"

RUN apk add --no-cache --update \
    python3-dev gcc \
    gfortran musl-dev g++ \
    libffi-dev openssl-dev \
    libxml2 libxml2-dev \
    libxslt libxslt-dev \
    libjpeg-turbo-dev zlib-dev \
    libpq postgresql-dev \ 

COPY /requirements.txt "${MAIN_DIR}"

RUN pip install --upgrade cython \
    && pip install --upgrade pip \
    && pip install -r requirements.txt

And the requirements.txt:

pandas==1.2.3
SQLAlchemy==1.4.11
psycopg2-binary

Upvotes: 0

Kevin Smith
Kevin Smith

Reputation: 676

I realize this question has been answered, but I have recently had a similar issue with numpy and pandas dependancies with a dockerized project. That being said, I hope that this will be of benefit to someone in the future.

My solution:

As pointed out by Aviv Sela, Alpine does not contain build tools by default and will need to be added though the Dockerfile. Thus see below my Dockerfile with the build packages required for numpy and pandas for be successfully installed on Alpine for the container.

FROM python:3.6-alpine3.7

RUN apk add --no-cache --update \
    python3 python3-dev gcc \
    gfortran musl-dev g++ \
    libffi-dev openssl-dev \
    libxml2 libxml2-dev \
    libxslt libxslt-dev \
    libjpeg-turbo-dev zlib-dev

RUN pip install --upgrade pip

ADD requirements.txt .
RUN pip install -r requirements.txt

The requirements.txt

numpy==1.17.1
pandas==0.25.1

EDIT:

Add the following (code snippet below) to the Dockerfile, before the upgrade pip RUN command. It is critical to the successful installation of pandas as pointed out by Bishwas Mishra in a comment.

RUN pip install --upgrade cython

Upvotes: 13

jersey bean
jersey bean

Reputation: 3639

Using a new version of python that is not yet supported with pandas will result in problems.

I found it does not work with a development version of Python:

FROM python:3.9.0a6-buster


RUN apt-get update && \
    apt-get -y install python3-pandas

COPY requirements.txt ./ 
RUN pip3 install --no-cache-dir -r 

requirements.txt:

numpy==1.18
pandas

I found it DOES work with an officially released version of Python:

FROM python:3.8-buster

Upvotes: 4

Rebeku
Rebeku

Reputation: 879

You're probably going to be better off building from a pandas image instead of base python. This will make iteration must faster and easier, because you won't ever have to reinstall pandas. I like amancevince/pandas ( https://hub.docker.com/r/amancevice/pandas/tags ). There are Alpine and Debian images available for every pandas tag, although I think they may all be python 3.7 now.

Upvotes: 2

Aviv Sela
Aviv Sela

Reputation: 221

Alpine don't contain build tools by default. Install build tool and create symbolic link for locale:

$ apk add --update curl gcc g++
$ ln -s /usr/include/locale.h /usr/include/xlocale.h
$ pip install numpy

Based on https://wired-world.com/?p=100

Upvotes: 8

ccasimiro9444
ccasimiro9444

Reputation: 415

I could create the Docker image now. There must have been some version incompatibilities between FROM python:3.6.5-alpine3.7 and pandas. I changed the Python version to FROM python:3, then it worked fine (also had to downgrade the pillow version to 5.1.0).

Upvotes: 1

Related Questions