Saravanan Setty
Saravanan Setty

Reputation: 438

How KSQL Windowed query works and maximum window size

I have two questions regarding querying in KSQL wrt queries that use windowing :

  1. Let's say I have the following aggregation query :

    SELECT id, COUNT(*) FROM testtopic_stream WINDOW TUMBLING (SIZE 30 DAYS) GROUP BY id;

Are the results of the aggregation above calculated by only using the new tick that comes in OR it actually will go through all the data for last 30 days and then perform the aggregation?

  1. What is the maximum possible window size for queries? I see I am able to set up a window for even like 30 days and the query seems to work fine now. Is there a recommended maximum window size?

Upvotes: 0

Views: 1687

Answers (1)

Matthias J. Sax
Matthias J. Sax

Reputation: 62350

It depends on auto.offset.reset strategy. If you set it to "earliest", the query will consumer all data from the underlying stream/topic (note, that "all" means really all data that is stored in the topic, ie, it depends on topic retention setting how much data this will be). If you set the config to "latest" -- what is the default -- the query will only process data that is written by upstream producers after the query was started.

In both cases, the size of the window has no impact on what data will be processed.

There is no limit on the window size. You can pick any size you want. Note: for tumbling windows, a smaller window size in fact increases storage requirement while a larger window sizes reduces storage requirement because there are fewer windows that need to be maintained in parallel.

Upvotes: 4

Related Questions