Hu.Cai
Hu.Cai

Reputation: 33

Google Dataflow shows AttributeError: 'module' object has no attribute 'Read'

I am using google cloud to do a testing, I follow the guide to run test against BigQuery . https://cloud.google.com/solutions/using-cloud-dataflow-for-batch-predictions-with-tensorflow

when I run the script:

python prediction/run.py \
--runner DataflowRunner \
--project $PROJECT \
--staging_location $BUCKET/staging \
--temp_location $BUCKET/temp \
--job_name $PROJECT-prediction-bq \
--setup_file prediction/setup.py \
--model $BUCKET/model \
--source bq \
--input $PROJECT:mnist.images \
--output $PROJECT:mnist.predict

It shows

Traceback (most recent call last):
  File "prediction/run.py", line 23, in <module>
    predict.run()
  File "/home/ahuoo_com/dataflow-prediction-example/prediction/modules/predict.py", line 98, in run
    images = p | 'ReadFromBQ' >> beam.Read(beam.io.BigQuerySource(known_args.input))
**AttributeError: 'module' object has no attribute 'Read'**

It looks like the apache_beam package doesn't contains the attribute 'Read'. I think the example google provided in github may be wrong. You can take a look at the code at line 98.

https://github.com/GoogleCloudPlatform/dataflow-prediction-example/blob/master/prediction/modules/predict.py

Is there anyone using this guide to do a test?

Upvotes: 0

Views: 2428

Answers (1)

Willian Fuks
Willian Fuks

Reputation: 11797

You are right, there's a small mistake in the code. In line 98 where it says:

images = p | 'ReadFromBQ' >> beam.Read(beam.io.BigQuerySource(known_args.input))

It should be:

images = p | 'ReadFromBQ' >> beam.io.Read(beam.io.BigQuerySource(known_args.input))

Also, at line 100 where it says:

predictions | 'WriteToBQ' >> beam.Write(beam.io.BigQuerySink(...))

it should also be like:

predictions | 'WriteToBQ' >> beam.io.Write(beam.io.BigQuerySink(...))

The PCollection Reading/ Writing resources comes from the io module and not apache_beam itself.

Upvotes: 2

Related Questions