ModuleNotFoundError: No module named 'pyarrow._dataset'

Question

I decided to familiarize with arrow package. I figured that it would be a good idea to run some example of its usage (https://github.com/apache/arrow/tree/master/python/examples/minimal_build).

docker build -t arrow_ubuntu_minimal -f Dockerfile.ubuntu .
docker run --rm -t -i -v $PWD:/io arrow_ubuntu_minimal /io/build_venv.sh

Unfortunately after running the latter command console yields:

E   ModuleNotFoundError: No module named 'pyarrow._dataset'

pyarrow/dataset.py:23: ModuleNotFoundError
====================================================================================== warnings summary ======================================================================================
pyarrow/tests/test_serialization.py:283
  /root/arrow/python/pyarrow/tests/test_serialization.py:283: PytestDeprecationWarning: @pytest.yield_fixture is deprecated.
  Use @pytest.fixture instead; they are the same.
    @pytest.yield_fixture(scope='session')

pyarrow/tests/test_pandas.py::TestConvertListTypes::test_infer_lists
pyarrow/tests/test_pandas.py::TestConvertListTypes::test_to_list_of_structs_pandas
pyarrow/tests/test_pandas.py::TestConvertListTypes::test_nested_large_list
  /root/venv/lib/python3.6/site-packages/pandas/core/dtypes/missing.py:475: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
    if np.any(np.asarray(left_value != right_value)):

pyarrow/tests/test_pandas.py::TestConvertListTypes::test_nested_large_list
  /root/venv/lib/python3.6/site-packages/pandas/core/dtypes/missing.py:475: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
    if np.any(np.asarray(left_value != right_value)):

-- Docs: https://docs.pytest.org/en/stable/warnings.html
================================================================================== short test summary info ===================================================================================
FAILED pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_filesystem - ModuleNotFoundError: No module named 'pyarrow._dataset'
============================================================ 1 failed, 3168 passed, 689 skipped, 16 xfailed, 5 warnings in 48.01s ============================================================
marcin@marcin-G3-3579:

Did anyone run into similar issues or has any idea how to fix it?

I am currently using ubuntu 20.04. Maybe this could cause the problem since example is set on ubuntu 18.04 but I see no way of checking it out.

ModuleNotFoundError: No module named 'pyarrow._dataset'

Answers (1)

Related Questions

ModuleNotFoundError: No module named &#39;pyarrow._dataset&#39;

Answers (1)

Related Questions

ModuleNotFoundError: No module named 'pyarrow._dataset'