Reputation: 307
I am new to PostgreSQL (specifically, I use Timescale db) and have a question regarding time windows.
Data:
date |customerid|names
2014-01-01|1 |Andrew
2014-01-02|2 |Pete
2014-01-03|2 |Andrew
2014-01-04|2 |Steve
2014-01-05|2 |Stef
2014-01-06|3 |Stef
2014-01-07|1 |Jason
2014-01-08|1 |Jason
The question is: Going back in time x days (viewed from every single row), how many distinct names are there which share the same id?
For x=2 days, the result should look like this:
date |customerid|names |count
2014-01-01|1 |Andrew |1
2014-01-02|2 |Pete |1
2014-01-03|2 |Andrew |2
2014-01-04|2 |Steve |3
2014-01-05|2 |Stef |3
2014-01-06|3 |Stef |1
2014-01-07|1 |Jason |1
2014-01-08|1 |Jason |1
Is this possible in PostgreSQL without using a loop over each single row?
Additional information: The time intervals of the data are not equidistant in reality.
Thank you very much!
Upvotes: 4
Views: 2339
Reputation: 1269445
It would be nice if you could use window functions:
select t.*,
count(distinct name) over (partition by id
order by date
range between interval 'x day' preceding and current row
) as cnt_x
from t;
Alas, that is not possible. So you can use a lateral join:
select t.*, tt.cnt_x
from t left join lateral
(select count(distinct t2.name) as cnt_x
from t t2
where t2.id = t.id and
t2.date >= t.date - interval 'x day' and t2.date <= t.date
) tt
on true;
For performance, you want an index on (id, date, name)
.
Upvotes: 6