andrew
andrew

Reputation: 51

BigQuery unable to insert job. Workflow failed

I need to run a batch job from GCS to BigQuery via Dataflow and Beam. All my files are avro with the same schema. I've created a dataflow java application that is successful on a smaller set of data (~1gb, about 5 files). But when I try to run it on a bigger set of data ( >500gb, >1000 files), i receive an error message

java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: Failed to create load job with id prefix 1b83679a4f5d48c5b45ff20b2b822728_6e48345728d4da6cb51353f0dc550c1b_00001_00000, reached max retries: 3, last failed load job: ...

After 3 retries it terminates with:

Workflow failed. Causes: S57....... A work item was attempted 4 times without success....

This step is the load to BigQuery.

Stack Driver says the processing is stuck in step ....for 10m00s... and

Request failed with code 409, performed 0 retries due to IOExceptions, performed 0 retries due to unsuccessful status codes.....

I looked up the 409 error code stating that I might have an existing job, dataset, or table. I've removed all the tables and re-ran the application but it still shows the same error message.

I am currently limited on 65 workers and I have them using n1-standard-4 cpus.

I believe there are other ways to move the data from gcs to bq, but i need to demonstrate dataflow.

Upvotes: 5

Views: 6738

Answers (4)

supriya badgujar
supriya badgujar

Reputation: 19

For me, permissions was not the issue. I resolved by converting all the column datatypes to string. It was not allowing datatypes other than string.

Upvotes: 0

Ricco D
Ricco D

Reputation: 7277

Posting the comment of @DeaconDesperado as community wiki, where they experienced the same error and what they did was remove the restricted characters (eg. Unicode letters, marks, numbers, connectors, dashes or spaces) in the table name and the error is gone.

Upvotes: 1

Tor Hovland
Tor Hovland

Reputation: 1649

I got the same problem using "roles/bigquery.jobUser", "roles/bigquery.dataViewer", and "roles/bigquery.user". But only when granting "roles/bigquery.admin" did the issue get resolved.

Upvotes: 0

Muthu
Muthu

Reputation: 21

"java.lang.RuntimeException: Failed to create job with prefix beam_load_csvtobigqueryxxxxxxxxxxxxxx, reached max retries: 3, last failed job: null. at org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJob.runJob(BigQueryHelpers.java:198)..... "

  • One of the possible cause could be the privilege issue. Ensure the user account which interacts with the BigQuery has privilege "bigquery.jobs.create" in the predefined role "*BigQuery User"

Upvotes: 1

Related Questions