Reputation: 3728
Im using pandas in a container and I get the following error:
Traceback (most recent call last):
File "/volumes/dependencies/site-packages/celery/app/trace.py", line 374, in trace_task
R = retval = fun(*args, **kwargs)
File "/volumes/dependencies/site-packages/celery/app/trace.py", line 629, in __protected_call__
return self.run(*args, **kwargs)
File "/volumes/code/autoai/celery/data_template/api.py", line 16, in run_data_template_task
data_template.run(data_bundle, columns=columns)
File "/volumes/code/autoai/models/data_template.py", line 504, in run
self.to_parquet(data_bundle, columns=columns)
File "/volumes/code/autoai/models/data_template.py", line 162, in to_parquet
}, parquet_path=data_file.path, directory="", dataset=self)
File "/volumes/code/autoai/core/datasets/parquet_converter.py", line 46, in convert
file_system.write_dataframe(parquet_path, chunk, directory, append=append)
File "/volumes/code/autoai/core/file_systems.py", line 76, in write_dataframe
append=append)
File "/volumes/dependencies/site-packages/pandas/core/frame.py", line 1945, in to_parquet
compression=compression, **kwargs)
File "/volumes/dependencies/site-packages/pandas/io/parquet.py", line 256, in to_parquet
impl = get_engine(engine)
File "/volumes/dependencies/site-packages/pandas/io/parquet.py", line 40, in get_engine
return FastParquetImpl()
File "/volumes/dependencies/site-packages/pandas/io/parquet.py", line 180, in __init__
import fastparquet
File "/volumes/dependencies/site-packages/fastparquet/__init__.py", line 8, in <module>
from .core import read_thrift
File "/volumes/dependencies/site-packages/fastparquet/core.py", line 13, in <module>
from . import encoding
File "/volumes/dependencies/site-packages/fastparquet/encoding.py", line 11, in <module>
from .speedups import unpack_byte_array
File "__init__.pxd", line 861, in init fastparquet.speedups
ValueError: numpy.ufunc has the wrong size, try recompiling. Expected 192, got 216
I read on other answers that this message shows up when pandas is compiled against a newer numpy version than the one you have installed. But updating both pandas and numpy did not work for me. I tried to find out if I have a few versions of numpy, but pip show numpy
seems to show the latest version.
Also, in a weird way, this happens only when I deploy locally and not on the server.
Any ideas how to go about fixing that? Or at least how to debug my numpy and pandas versions (if there are multiple versions how do I check that)
I tried: upgrading both packages and removing and reinstalling them. No help there.
Upvotes: 16
Views: 15407
Reputation: 440
I had the same issue as above. My solution was to install Python 2.7 from the official website: https://www.python.org/downloads/release/python-2713/
Upvotes: 0
Reputation: 3728
The answer was that fastparquet (a package that is used by pandas) was using numpy older binary file for some reason.
Updating that package helped. I guess that if someone else comes around this problem, to try and update all the related packages (that use numpy) will be the right way to go
Upvotes: 5
Reputation: 303
well actually my problem was solved somehow by
pip uninstall numpy
pip install numpy
the real process is
➜ ~ pip3 uninstall numpy -y
Uninstalling numpy-1.14.5:
Successfully uninstalled numpy-1.14.5
➜ ~ pip3 install numpy
Requirement already satisfied: numpy in /usr/lib/python3/dist-packages (1.16.1)
➜ ~ pip3 uninstall numpy
Uninstalling numpy-1.16.1:
Would remove:
/usr/bin/f2py3
/usr/bin/f2py3.7
/usr/lib/python3/dist-packages/numpy
/usr/lib/python3/dist-packages/numpy-1.16.1.egg-info
Proceed (y/n)? y
Successfully uninstalled numpy-1.16.1
➜ ~ pip3 install numpy
Collecting numpy...
which means the problem might be version conflict?
Upvotes: 7
Reputation: 348
I had the same issue and tried all the above responses (at the time of writing). The only thing that worked for me was switching to pyarrow
.
I then made sure to specify the pyarrow
engine when using parquet in pandas. Although pandas should default to the pyarrow
engine before fastparquet
according to the docs.
pd.read_parquet('./path', engine='pyarrow')
Upvotes: 0
Reputation: 833
TLDR: If docker add:
RUN pip install numpy
before you install pandas (probably just your pip install -r requirements.txt) and it will just work again.
I am doing this in docker building pandas in alpine and run into the same issue and it JUST popped up (Dec 27th ish 2018) for a build that's been working just fine previously.
Upvotes: 11
Reputation: 1
I had the same issue with pandas. This problem was solved by doing the following workaround,
pip uninstall --yes numpy
easy_install --upgrade numpy
Upvotes: -1
Reputation: 662
Make sure that the right version of numpy is installed on /volumes/dependencies/site-packages/
and you are using it.
Upvotes: 0