NotSoShabby
NotSoShabby

Reputation: 3728

numpy.ufunc has the wrong size, try recompiling. even with the latest pandas and numpy versions

Im using pandas in a container and I get the following error:

Traceback (most recent call last):
  File "/volumes/dependencies/site-packages/celery/app/trace.py", line 374, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/volumes/dependencies/site-packages/celery/app/trace.py", line 629, in __protected_call__
    return self.run(*args, **kwargs)
  File "/volumes/code/autoai/celery/data_template/api.py", line 16, in run_data_template_task
    data_template.run(data_bundle, columns=columns)
  File "/volumes/code/autoai/models/data_template.py", line 504, in run
    self.to_parquet(data_bundle, columns=columns)
  File "/volumes/code/autoai/models/data_template.py", line 162, in to_parquet
    }, parquet_path=data_file.path, directory="", dataset=self)
  File "/volumes/code/autoai/core/datasets/parquet_converter.py", line 46, in convert
    file_system.write_dataframe(parquet_path, chunk, directory, append=append)
  File "/volumes/code/autoai/core/file_systems.py", line 76, in write_dataframe
    append=append)
  File "/volumes/dependencies/site-packages/pandas/core/frame.py", line 1945, in to_parquet
    compression=compression, **kwargs)
  File "/volumes/dependencies/site-packages/pandas/io/parquet.py", line 256, in to_parquet
    impl = get_engine(engine)
  File "/volumes/dependencies/site-packages/pandas/io/parquet.py", line 40, in get_engine
    return FastParquetImpl()
  File "/volumes/dependencies/site-packages/pandas/io/parquet.py", line 180, in __init__
    import fastparquet
  File "/volumes/dependencies/site-packages/fastparquet/__init__.py", line 8, in <module>
    from .core import read_thrift
  File "/volumes/dependencies/site-packages/fastparquet/core.py", line 13, in <module>
    from . import encoding
  File "/volumes/dependencies/site-packages/fastparquet/encoding.py", line 11, in <module>
    from .speedups import unpack_byte_array
  File "__init__.pxd", line 861, in init fastparquet.speedups
ValueError: numpy.ufunc has the wrong size, try recompiling. Expected 192, got 216

I read on other answers that this message shows up when pandas is compiled against a newer numpy version than the one you have installed. But updating both pandas and numpy did not work for me. I tried to find out if I have a few versions of numpy, but pip show numpy seems to show the latest version.

Also, in a weird way, this happens only when I deploy locally and not on the server.

Any ideas how to go about fixing that? Or at least how to debug my numpy and pandas versions (if there are multiple versions how do I check that)

I tried: upgrading both packages and removing and reinstalling them. No help there.

Upvotes: 16

Views: 15407

Answers (7)

kbenda
kbenda

Reputation: 440

I had the same issue as above. My solution was to install Python 2.7 from the official website: https://www.python.org/downloads/release/python-2713/

Upvotes: 0

NotSoShabby
NotSoShabby

Reputation: 3728

The answer was that fastparquet (a package that is used by pandas) was using numpy older binary file for some reason.

Updating that package helped. I guess that if someone else comes around this problem, to try and update all the related packages (that use numpy) will be the right way to go

Upvotes: 5

Jason D
Jason D

Reputation: 303

well actually my problem was solved somehow by

 pip uninstall numpy
 pip install numpy

the real process is

➜  ~ pip3 uninstall numpy -y
Uninstalling numpy-1.14.5:
  Successfully uninstalled numpy-1.14.5
➜  ~ pip3 install numpy     
Requirement already satisfied: numpy in /usr/lib/python3/dist-packages (1.16.1)
➜  ~ pip3 uninstall numpy   
Uninstalling numpy-1.16.1:
  Would remove:
    /usr/bin/f2py3
    /usr/bin/f2py3.7
    /usr/lib/python3/dist-packages/numpy
    /usr/lib/python3/dist-packages/numpy-1.16.1.egg-info
Proceed (y/n)? y
  Successfully uninstalled numpy-1.16.1
➜  ~ pip3 install numpy   
Collecting numpy...

which means the problem might be version conflict?

Upvotes: 7

gavinest
gavinest

Reputation: 348

I had the same issue and tried all the above responses (at the time of writing). The only thing that worked for me was switching to pyarrow.

I then made sure to specify the pyarrow engine when using parquet in pandas. Although pandas should default to the pyarrow engine before fastparquet according to the docs.

pd.read_parquet('./path', engine='pyarrow')

Upvotes: 0

voglster
voglster

Reputation: 833

TLDR: If docker add:

RUN pip install numpy

before you install pandas (probably just your pip install -r requirements.txt) and it will just work again.

I am doing this in docker building pandas in alpine and run into the same issue and it JUST popped up (Dec 27th ish 2018) for a build that's been working just fine previously.

Upvotes: 11

manfred
manfred

Reputation: 1

I had the same issue with pandas. This problem was solved by doing the following workaround,

pip uninstall --yes numpy

easy_install --upgrade numpy

Upvotes: -1

yann
yann

Reputation: 662

Make sure that the right version of numpy is installed on /volumes/dependencies/site-packages/ and you are using it.

Upvotes: 0

Related Questions