adarsh
adarsh

Reputation: 1141

NPE in dataflow pipeline at SourceOperationExecutor.isSplitOperationTooLargeForDataflowService

My dataflow pipeline had been running fine till the last run. Today when I ran it on a new dataset, I started getting NullPointerException. The problem is that the exception does not seem to be coming from my code (anywhere in the stacktrace) as can be seen below-

Is this a bug in dataflow framework or (as the exception seems to be happening in isSplitOperationTooLargeForDataflowService), this dataset, more precisely the split on it, is too large for dataflow?

Any help/insight would be much appreciated!

2016-07-04T16:27:00.044Z: Error:   (fb0b4effcb8800a6):    
java.lang.NullPointerException
at com.google.cloud.dataflow.sdk.runners.worker.SourceOperationExecutor.isSplitOperationTooLargeForDataflowService(SourceOperationExecutor.java:100)
at com.google.cloud.dataflow.sdk.runners.worker.SourceOperationExecutor.isSplitResponseTooLarge(SourceOperationExecutor.java:92)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:227)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:146)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:164)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:145)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:132)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Upvotes: 0

Views: 63

Answers (1)

Kenn Knowles
Kenn Knowles

Reputation: 6023

This is bug that was fixed in the 1.4.0 release of the Dataflow SDK. As of this writing, the latest version of the SDK is 1.6.0.

It sounds like you are hitting an issue with the Eclipse plugin if it displays "up to date" at version 1.2.1. Your problems should be solved if you manually update your pom.xml to use version 1.6.0 of the SDK.

Upvotes: 1

Related Questions