Reputation: 11
I have a ParDo that uses state and timers with a periodically updating PcollectionView as sideInput to that parDo; google dataflow will throw an exception that timers are not allowed in such a case. Is there another way to feed config data to the parDo with out sideInput? Essentially, the sideInput was a map of config data that was reading from datastore about every 24 hours.
I am currently trying to see if I can create a ParDo before the one with state and timers to periodically update the config, but I don't see how we can access that map from within the next ParDo. Any suggestions?
Note: This pipeline is running in streaming mode with a global window and reading from pubsub messages as they arrive. Datastore is used to hold data needed to decide when to output an element to a pubsub topic.
Upvotes: 1
Views: 104
Reputation: 342
Instead of using state timers to update the side input, you can use a fixed window to periodically update your PCollectionView with your data source:
PCollectionView<Map<String,String>> sideInput = pipeline
.apply(notifications)
.apply(
Window.<Long>into(FixedWindows.of(Duration.standardMinutes(refreshMinutes)))
.triggering(
Repeatedly.forever(AfterPane.elementCountAtLeast(1))
)
.withAllowedLateness(Duration.ZERO)
.discardingFiredPanes()
)
.apply( /* query data source */ )
.apply(View.<Map<String,String>>asSingleton());
Upvotes: 0