Abhinav Jha
Abhinav Jha

Reputation: 53

Use Of experiments=no_use_multiple_sdk_containers in Google cloud dataflow

Issue Summary: Hi, I am using avro version 1.11.0 for parsing an avro file and decoding it. We have a custom requirement, so i am not able to use ReadFromAvro. When trying this with dataflow there arises a dependency issues as avro-python3 with version 1.82 is already available. The issue is of class TimestampMillisSchema which is not present in avro-python3. It fails stating Attribute TimestampMillisSchema not found in avro.schema. I then tried passing a requirements file with avro==1.11.0 but now the dataflow was not able to start giving error "Error syncing pod" which seems to be because of dependencies conflicts.

To Solve the issue , we set an experiment flag (--experiments=no_use_multiple_sdk_containers ) which ran fine.

I want to know a better solution of my issue and also does the above flag will effect the pipeline performance.

Upvotes: 1

Views: 915

Answers (1)

varun r
varun r

Reputation: 204

Please try with the dataflow run command:

--prebuild_sdk_container_engine=cloud_build --experiments=use_runner_v2

this would use cloud build to build the container with your extra dependencies and then would use it within the dataflow run.

Upvotes: 2

Related Questions