Galuoises
Galuoises

Reputation: 3283

Cannot import pyarrow in pyspark

I am trying to use pyarrow in with pyspark. However when I try to execute

import pyarrow

I receive the following error

    In [1]: import pyarrow
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-f1048abcb32d> in <module>
----> 1 import pyarrow

~/opt/anaconda3/lib/python3.7/site-packages/pyarrow/__init__.py in <module>
     47 import pyarrow.compat as compat
     48
---> 49 from pyarrow.lib import cpu_count, set_cpu_count
     50 from pyarrow.lib import (null, bool_,
     51                          int8, int16, int32, int64,

ImportError: dlopen(/Users/user/opt/anaconda3/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-darwin.so, 2): Library not loaded: @rpath/libboost_filesystem.dylib
  Referenced from: /Users/user/opt/anaconda3/lib/libarrow.15.1.0.dylib
  Reason: image not found

I have tried to install pyarrow in a conda environment, downgrading to python 3.6 but without success.

Doe someone have any suggestion to solve the problem?

Upvotes: 5

Views: 8608

Answers (2)

Gonzalo Garcia
Gonzalo Garcia

Reputation: 6612

The Accepted answer didn't work for me as I'm in MacOs, I have been researching and the one that helped me was this one. For those that have the same problem but in MacOS.

brew update && brew upgrade
brew switch openssl 1.0.2s

Worked for me Catalina 10.15.4

Upvotes: 2

Thomas Martin
Thomas Martin

Reputation: 686

It looks like PyArrow was not installed properly. So please try clean older packages and then install pyarrow again using below command,

   {{ conda install -c conda-forge pyarrow }}

Upvotes: 3

Related Questions