Pierre
Pierre

Reputation: 21

Google Cloud Dataflow jobs failing (IO errors)

Some of our Dataflow jobs randomly crash while reading source data files.

The following error is written in the job logs (there is nothing in the workers logs) :

11 févr. 2016 à 08:30:54
(33b59f945cff28ab): Workflow failed. 
Causes: (fecf7537c059fece): S02:read-edn-file2/TextIO.Read+read-edn-file2    
/ParDo(ff19274a)+ParDo(ff19274a)5+ParDo(ff19274a)6+RemoveDuplicates
/CreateIndex+RemoveDuplicates/Combine.PerKey
/GroupByKey+RemoveDuplicates/Combine.PerKey/Combine.GroupedValues
/Partial+RemoveDuplicates/Combine.PerKey/GroupByKey/Reify+RemoveDuplicates
/Combine.PerKey/GroupByKey/Write faile

We also get that kind of error sometimes (logged in the workers logs) :

2016-02-15T10:27:41.024Z: Basic:  S18: (43c8777b75bc373e): Executing operation group-by2/GroupByKey/Read+group-by2/GroupByKey/GroupByWindow+ParDo(ff19274a)19+ParDo(ff19274a)20+ParDo(ff19274a)21+write-edn-file3/ParDo(ff19274a)+write-bq-table-from-clj3/ParDo(ff19274a)+write-bq-table-from-clj3/BigQueryIO.Write+write-edn-file3/TextIO.Write
2016-02-15T10:28:03.994Z: Error:   (af73c53187b7243a): java.io.IOException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone
{
 "code" : 503,
 "errors" : [ {
   "domain" : "global",
   "message" : "Backend Error",
   "reason" : "backendError"
 } ],
 "message" : "Backend Error"
}
    at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:431)
    at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:289)
    at com.google.cloud.dataflow.sdk.runners.worker.TextSink$TextFileWriter.close(TextSink.java:243)
    at com.google.cloud.dataflow.sdk.util.common.worker.WriteOperation.finish(WriteOperation.java:100)
    at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:254)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:191)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:144)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:180)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:161)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:148)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

The source data files are stored in google cloud storage.

Data paths are correct and the job generally works after relaunching it. We didn't experience this issue until the end of january.

Jobs are launched with these parameters : --tempLocation='gstoragelocation' --stagingLocation='another gstorage location' --runner=BlockingDataflowPipelineRunner --numWorkers='a few dozen' --zone=europe-west1-d

SDK version : 1.3.0

Thanks

Upvotes: 2

Views: 1000

Answers (1)

Nick
Nick

Reputation: 3591

As a clearly-marked "backend error", this should be reported on either the Cloud Dataflow Public Issue Tracker](google-cloud-dataflow) or the more general Cloud Platform Public Issue Tracker and there's relatively little anybody on Stack Overflow could do to help you debug this.

Upvotes: 0

Related Questions