Reputation: 406
How can I get GroupByKey to trigger early results, rather than wait for all the data to arrive (which in my case is a pretty long time).I tried to split my input PCollection into windows with an early trigger, but it just doesn`t work. It still waits for all the data to arrive before giving out the results.
PCollection<List<String>> input = ...
PCollection<KV<Integer,List<String>>> keyedInput = input.apply(ParDo.of(new AddArbitraryKey()))
keyedInput.apply(Window<KV<Integer,List<String>>>into(
FixedWindows.of(Duration.standardSeconds(1)))
.triggering(Repeatedly.forever(AfterWatermark.pastEndOfWindow()))
.withAllowedLateness(Duration.ZERO).discardingFiredPanes())
.apply(GroupByKey.<Integer,List<String>>create())
.apply(ParDo.of(new RemoveArbitraryKey()))
.apply(ParDo.of(new FurtherProcessing())
I am doing this to prevent fusing . The AddArbitraryKey transform outputs its elements with Timestamp. However, GroupByKey holds up everything until all the data arrives (for all the windows) . Could someone please tell me how i can get it to trigger early. Thank You .
Upvotes: 3
Views: 1473
Reputation: 17913
To prevent fusion, it's better to use the transform Reshuffle.viaRandomKey()
which performs better and makes sure to not introduce any additional triggering delays.
Upvotes: 1
Reputation: 1901
You can install a trigger like
Repeatedly
.forever(AfterProcessingTime
.pastFirstElementInPane()
.plusDuration(Duration.standardMinutes(1))
.orFinally(AfterWatermark.pastEndOfWindow())
.discardingFiredPanes()
Or
AfterWatermark.pastEndOfWindow()
.withEarlyFirings(
AfterProcessingTime
.pastFirstElementInPane()
.plusDuration(Duration.standardMinutes(1))
Upvotes: 2