Geoff
Geoff

Reputation: 71

Apache Beam in Python, error with beam.io.TextFileSource

I'm trying to run the code in the Data Science on GCP repo and keep hitting an error in the Beam code.

This is the line that gives an error: beam.Read(beam.io.TextFileSource('airports.csv.gz')

Here's the error I'm getting: AttributeError: 'module' object has no attribute 'TextFileSource'

Here's the complete file: https://github.com/GoogleCloudPlatform/data-science-on-gcp/blob/master/04_streaming/simulate/df01.py

Does anyone know how to get this working, or what I'm missing?

Upvotes: 0

Views: 1129

Answers (2)

Mike Traffanstead
Mike Traffanstead

Reputation: 11

Google Dataflow is migrating to the Apache Beam standard which means you should be using apache_beam.io.textio.ReadFromText. The standard is still evolving so it's best to consult the Release Notes whenever you upgrade the package.

Upvotes: 1

Lak
Lak

Reputation: 4166

It appears that you are using an older version of apache-beam/cloud-dataflow.

Do:

pip freeze | grep dataflow

When I do this, I get:

google-cloud-dataflow==0.4.3

If your version you get is older, try:

pip install google-cloud-dataflow

and repeat the pip freeze command. If you keep getting an older version, then you are in Python library hell and I suggest using virtualenv to ensure that you are using the latest version of all packages ...

Upvotes: 0

Related Questions