user101010
user101010

Reputation: 117

Apache Beam sliding windows

Let's suppose that I have an 2 hours window that starts every 1 minute. Next step would be to apply GroupBy transform.

Is it holding copies of overlapping data for each window separately in memory? Or Apache Beam has a logic to know that record A belongs to multiple windows?

I would be grateful for explaining this. Could not really find relevant information

Upvotes: 0

Views: 264

Answers (1)

Anton
Anton

Reputation: 2539

It is an implementation detail that should not be observable (or observed) by pipeline authors. Beam/runner can potentially decide to fuse multiple transforms and keep and reuse the elements in memory. Or not.

I don't know whether this specific topic is covered, but there are few words about immutability of the elements at the end of ParDo section in the programming guide. And overall description of Beam execution model is here.

Upvotes: 1

Related Questions