Reputation: 438
I have two questions regarding querying in KSQL wrt queries that use windowing :
Let's say I have the following aggregation query :
SELECT id, COUNT(*) FROM testtopic_stream WINDOW TUMBLING (SIZE 30 DAYS) GROUP BY id;
Are the results of the aggregation above calculated by only using the new tick that comes in OR it actually will go through all the data for last 30 days and then perform the aggregation?
Upvotes: 0
Views: 1687
Reputation: 62350
It depends on auto.offset.reset
strategy. If you set it to "earliest"
, the query will consumer all data from the underlying stream/topic (note, that "all" means really all data that is stored in the topic, ie, it depends on topic retention setting how much data this will be). If you set the config to "latest"
-- what is the default -- the query will only process data that is written by upstream producers after the query was started.
In both cases, the size of the window has no impact on what data will be processed.
There is no limit on the window size. You can pick any size you want. Note: for tumbling windows, a smaller window size in fact increases storage requirement while a larger window sizes reduces storage requirement because there are fewer windows that need to be maintained in parallel.
Upvotes: 4