Thomas Groh
Thomas Groh

Reputation: 511

Missing object or bucket in path when running on Dataflow

When trying to run a pipeline on the Dataflow service, I specify the staging and temp buckets (in GCS) on the command line. When the program executes, I get a RuntimeException before my pipeline runs, where the root cause is that I'm missing something in the path.

Caused by: java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions) ... Caused by: java.lang.IllegalArgumentException: Missing object or bucket in path: 'gs://df-staging-bucket-57763/', did you mean: 'gs://some-bucket/df-staging-bucket-57763'?

gs://df-staging-bucket-57763/ already exists in my project, and I have access to it. What do I need to add to make this work?

Upvotes: 8

Views: 2501

Answers (2)

user1527893
user1527893

Reputation: 29

Update run configuration as below:

  1. uncheck flag "Use Default Dataflow options" under the Pipeline Arguments tab. Select pipeline arguments manually.
  2. Keep blank value for "Cloud Storage staging location".

Upvotes: -1

Thomas Groh
Thomas Groh

Reputation: 511

The DataflowRunner requires that the staging location and temp locations be a location within a bucket rather than the top-level of a bucket. Adding a directory (such as --stagingLocation=gs://df-staging-bucket-57763/staging or --tempLocation=gs://df-staging-bucket-57763/temp) to your arguments (for each of the stagingLocation and gcpTempLocation arguments) will be sufficient to run the pipeline.

Upvotes: 15

Related Questions