i_am_cris
i_am_cris

Reputation: 627

Dataflow Pipeline - “Processing stuck in step <STEP_NAME> for at least <TIME> without outputting or completing in state finish…”

Since I'm not allowed to ask my question in the same thread where another person have the same problem (but not using a template) I'm creating this new thread.

The problem: Im creating a dataflow job from a template in gcp to ingest data from pub/sub into BQ. This works fine until the job executes. The job gets "stuck" and does not write anything to BQ.

I cant do so much because I cant choose the beam version in the template. This is the error:

Processing stuck in step WriteSuccessfulRecords/StreamingInserts/StreamingWriteTables/StreamingWrite for at least 01h00m00s without outputting or completing in state finish
  at sun.misc.Unsafe.park(Native Method)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
  at java.util.concurrent.FutureTask.get(FutureTask.java:191)
  at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:803)
  at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:867)
  at org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:140)
  at org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:112)
  at org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn$DoFnInvoker.invokeFinishBundle(Unknown Source)

Any ideas how to get this to work?

Upvotes: 0

Views: 2625

Answers (3)

Zeeshan
Zeeshan

Reputation: 1278

I was getting the same error and reason was that I created an empty BigQuery table without specifying an schema. Make sure to create a BQ table with a schema before you can load data via Dataflow.

Upvotes: 0

rsantiago
rsantiago

Reputation: 2099

The issue is coming from the step WriteSuccessfulRecords/StreamingInserts/StreamingWriteTables/StreamingWrite which suggest a problem while writing the data.

Your error can be replicated by (using either Pub/Sub Subscription to BigQuery or Pub/Sub Topic to BigQuery):

  • Configuring a template with a table that doesn't exist.
  • Starting the template with a correct table and delete it during the job execution.

In both cases the stuckness message happens because the data is being read from Pubsub but it is waiting for the table availability to insert the data. The error is being reported each 5 minutes and it gets resolved when the table is created.

To verify the table configured in your template, see the property outputTableSpec in the PipelineOptions in the Dataflow UI.

Upvotes: 1

I had the same issue before. The problem was that I used NestedValueProviders to evaluate the Pub/Sub topic/subscription and this is not supported in case of templated pipelines.

Upvotes: 0

Related Questions