Reputation: 35
I have sort of a weird one for all the SQL masters out there. I need to get the distinct count of items in a moving window of 14 days. I tried dense_rank but it didn't specify (or I did not know how to) specify the 14 day moving window.
For simplicity my dataset has 3 columns.
A quick example of my endo goal would be the following:
So then for day 1 my uniques would be 4, Day 2 my uniques would be 5 and day 3 my uniques would be 6 (1,2,3,4,5,6)
Once I get to day 15 I would ignore values found in day 1 and only take days 2-15
Any and all help would be greatly appreciated.
Upvotes: 0
Views: 359
Reputation: 172944
Another option to consider - with use of HyperLogLog++ functions - so it consumes less resources and faster
select store, date,
( select hll_count.merge(sketch)
from t.sketches_14days sketch
) distinct_items_count
from (
select store, date,
array_agg(daily_sketch) over(partition by store order by unix_date(date) range between 13 preceding and current row) sketches_14days
from (
select store, date, hll_count.init(item_code) daily_sketch
from your_table
group by store, date
)
) t
Note:
HLL++ functions are approximate aggregate functions. Approximate aggregation typically requires less memory than exact aggregation functions, like COUNT(DISTINCT), but also introduces statistical uncertainty. This makes HLL++ functions appropriate for large data streams for which linear memory usage is impractical, as well as for data that is already approximate.
Upvotes: 0
Reputation: 172944
Consider below approach
select store, date,
( select count(distinct item)
from t.items item
) distinct_items_count
from (
select store, date, any_value(items) items
from (
select store, date,
array_agg(item_code) over(partition by store order by unix_date(date) range between 13 preceding and current row) items
from your_table
)
group by store, date
) t
Upvotes: 2