Reputation: 511
When trying to run a pipeline on the Dataflow service, I specify the staging and temp buckets (in GCS) on the command line. When the program executes, I get a RuntimeException before my pipeline runs, where the root cause is that I'm missing something in the path.
Caused by: java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions)
...
Caused by: java.lang.IllegalArgumentException: Missing object or bucket in path: 'gs://df-staging-bucket-57763/', did you mean: 'gs://some-bucket/df-staging-bucket-57763'?
gs://df-staging-bucket-57763/
already exists in my project, and I have access to it. What do I need to add to make this work?
Upvotes: 8
Views: 2501
Reputation: 29
Update run configuration as below:
Upvotes: -1
Reputation: 511
The DataflowRunner requires that the staging location and temp locations be a location within a bucket rather than the top-level of a bucket. Adding a directory (such as --stagingLocation=gs://df-staging-bucket-57763/staging
or --tempLocation=gs://df-staging-bucket-57763/temp
) to your arguments (for each of the stagingLocation
and gcpTempLocation
arguments) will be sufficient to run the pipeline.
Upvotes: 15