How can I implement size based batching instead of time based for BoundedSource in apache beam/data flow?

Question

Intention: I have a bounded data set (Rest API exposed, 10k records) to be written to BigQuery with some additional steps. As my data set is bounded I've implemented BoundedSource interface to read records in my apache beam pipeline.

Problem: all 10k records are read in one shot (one shot for write to BigQiery as well). But I want to query small part (for example 200 rows), process, save to BigQuery and then query next 200 rows.

I've found that I can use windowing with bounded PCollections, but windows are created on time basis (every 10 sec for example) and I want it to be on record counter basis (every 200 records)

Question: How can I implement the mentioned splitting to batches/windows with 200 records size? Am I missing something?

The question is similar to this but it wasn't answered

How can I implement size based batching instead of time based for BoundedSource in apache beam/data flow?

Answers (1)

Related Questions