GroupByKey node from BigQueryIO.writeTableRows() does not emit elements

Question

My streaming dataflow pipeline, which pulls data from PubSub, doesn't write away to BigQuery and logs no errors. The elements go into the node "Write to BigQuery/StreamingInserts/StreamingWriteTables/Reshuffle/GroupByKey":

which is created implicitly like this:

PCollection rows = ...;
rows.apply("Write to BigQuery",
    BigQueryIO.writeTableRows().to(poptions.getOutputTableName())
        .withSchema(...)
        .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
        .withMethod(BigQueryIO.Write.Method.STREAMING_INSERTS)
        .withFailedInsertRetryPolicy(InsertRetryPolicy.alwaysRetry())
        .withExtendedErrorInfo());

But elements never leave it, or at least not within the system lag which is now 45min. This is supposed to be a streaming job - how can I make it flush and write the data? This is beam version 2.13.0. Thank you.

UPDATE - StackDriver log (no errors) for the step to write data to BigQuery:

I can also add that this works if I use the DirectRunner in the cloud (but only for small amounts of data) and either runner if I insert row by row using the java interface to BigQuery (but that is at least two orders of magnitude too slow to start with).

GroupByKey node from BigQueryIO.writeTableRows() does not emit elements

Answers (1)

Related Questions