data confusion
data confusion

Reputation: 33

Google Dataflow python quickstart error - GcsIO has no attribute

I have been following the Dataflow Python Quickstart and get an error when running the wordcount example pipeline:

... File "apache_beam/io/fileio.py", line 281, in glob return gcsio.GcsIO().glob(path, limit) AttributeError: 'NoneType' object has no attribute 'GcsIO'

I have tried with my own pipeline with the same result. I am not sure what the problem is here as i thought i had followed the tutorial exactly and this error seems to be related to the read/write transform

Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/Users/Alex/beam/sdks/python/apache_beam/examples/wordcount.py", line 116, in run() File "/Users/Alex/beam/sdks/python/apache_beam/examples/wordcount.py", line 87, in run lines = p | 'read' >> ReadFromText(known_args.input) File "apache_beam/io/textio.py", line 378, in init skip_header_lines=skip_header_lines) File "apache_beam/io/textio.py", line 87, in init validate=validate) File "apache_beam/io/filebasedsource.py", line 97, in init self._validate() File "apache_beam/io/filebasedsource.py", line 171, in _validate if len(fileio.ChannelFactory.glob(self._pattern, limit=1)) <= 0: File "apache_beam/io/fileio.py", line 281, in glob return gcsio.GcsIO().glob(path, limit) AttributeError: 'NoneType' object has no attribute 'GcsIO'

Any idea what i am doing wrong?

Thanks

Upvotes: 3

Views: 1793

Answers (2)

Juve
Juve

Reputation: 10824

Just installing google-apitools did not solve the problem for me. I had to directly install the SDK from source, including its gcp dependencies, defined in the requires.txt found in the SDKs egg-info:

# run this in your virtualenv
SDK_PATH=sdks/python
pip install -e $SDK_PATH[gcp]

Logged in via gcloud auth application-default login, I could then successfully run the wordcount example.

Edit: Answer rewritten, since previous solution was not working as expected.

Upvotes: 1

Pablo
Pablo

Reputation: 11031

This happens because you do not have the google-apitools package installed (This is mentioned in the code, but it should be better documented).

Try running pip install google-apitools in your virtual environment, and rerun the pipeline (note that you need to have Google Cloud credentials in your system).

Upvotes: 2

Related Questions