Reputation: 397
I have a PCollection in GCP Dataflow/Apache Beam. Instead of processing it one by one, I need to combine "by N". Something like grouped(N)
. So, in case of bounded processing, it will group by 10 items in batch and last batch with whatever left.
Is this possible in Apache Beam?
Upvotes: 1
Views: 1555
Reputation: 3688
Edit, looks like: Google Dataflow "elementCountExact" aggregation
You should be able to do something similar by assigning elements to global window and using AfterPane.elementCountAtLeast(N)
. You still need to account for what what if there isn’t enough elements to fire the trigger. You could use this:
Repeatedly.forever(AfterFirst.of(
AfterPane.elementCountAtLeast(N),
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMinutes(X))))
But you should ask yourself why do you need this heuristic in the first place, there probably is more idomatice way to solve your problem. Read about Data-Driven Triggers
in Beam’s programming guide
Upvotes: 3