Reputation: 117
Let's suppose that I have an 2 hours window that starts every 1 minute. Next step would be to apply GroupBy transform.
Is it holding copies of overlapping data for each window separately in memory? Or Apache Beam has a logic to know that record A belongs to multiple windows?
I would be grateful for explaining this. Could not really find relevant information
Upvotes: 0
Views: 264
Reputation: 2539
It is an implementation detail that should not be observable (or observed) by pipeline authors. Beam/runner can potentially decide to fuse multiple transforms and keep and reuse the elements in memory. Or not.
I don't know whether this specific topic is covered, but there are few words about immutability of the elements at the end of ParDo
section in the programming guide. And overall description of Beam execution model is here.
Upvotes: 1