Reputation: 71
I'm trying to run the code in the Data Science on GCP repo and keep hitting an error in the Beam code.
This is the line that gives an error: beam.Read(beam.io.TextFileSource('airports.csv.gz')
Here's the error I'm getting: AttributeError: 'module' object has no attribute 'TextFileSource'
Here's the complete file: https://github.com/GoogleCloudPlatform/data-science-on-gcp/blob/master/04_streaming/simulate/df01.py
Does anyone know how to get this working, or what I'm missing?
Upvotes: 0
Views: 1129
Reputation: 11
Google Dataflow is migrating to the Apache Beam standard which means you should be using apache_beam.io.textio.ReadFromText. The standard is still evolving so it's best to consult the Release Notes whenever you upgrade the package.
Upvotes: 1
Reputation: 4166
It appears that you are using an older version of apache-beam/cloud-dataflow.
Do:
pip freeze | grep dataflow
When I do this, I get:
google-cloud-dataflow==0.4.3
If your version you get is older, try:
pip install google-cloud-dataflow
and repeat the pip freeze command. If you keep getting an older version, then you are in Python library hell and I suggest using virtualenv to ensure that you are using the latest version of all packages ...
Upvotes: 0