Sanjana Moghe
Sanjana Moghe

Reputation: 41

Missing optional dependency 'gcsfs'. The gcsfs library is required to handle GCS files Use pip or conda to install gcsfs

I am trying to read a CSV file that is stored in a GCS bucket into a dataframe using Dataflow. The job is failing with the error:

raise_with_traceback raise exc.with_traceback(traceback) File "apache_beam/runners/common.py", line 1213, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 570, in apache_beam.runners.common.SimpleInvoker.invoke_process File "/home/curate_try_final.py", line 71, in <lambda> File "/home/curate_try_final.py", line 66, in convert_to_parquet File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 685, in parser_f return _read(filepath_or_buffer, kwds) File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 440, in _read filepath_or_buffer, encoding, compression File "/usr/local/lib/python3.7/site-packages/pandas/io/common.py", line 213, in get_filepath_or_buffer from pandas.io import gcs File "/usr/local/lib/python3.7/site-packages/pandas/io/gcs.py", line 5, in <module> "gcsfs", extra="The gcsfs library is required to handle GCS files" File "/usr/local/lib/python3.7/site-packages/pandas/compat/_optional.py", line 93, in import_optional_dependency raise ImportError(message.format(name=name, extra=extra)) from None ImportError: Missing optional dependency 'gcsfs'. The gcsfs library is required to handle GCS files Use pip or conda to install gcsfs. [while running 'ConvertToParquet']

It shows the same error even after installing using

sudo pip3 install gcsfs

Could someone please help?

Upvotes: 4

Views: 14526

Answers (1)

Rally H
Rally H

Reputation: 142

The GCSFS library can be installed using conda or pip:

conda install -c conda-forge gcsfs

or

pip install gcsfs

or by cloning the repository:

git clone https://github.com/dask/gcsfs/
cd gcsfs/
pip install .

More detailed information can be found here.

Upvotes: 5

Related Questions