Reputation: 37
I have a left table:
CREATE TABLE tab (
code int
, max date
, min date
);
And a right table with user activity (one line per user per day):
CREATE TABLE activity (
date date
, "user" text
);
Sample data:
Table tab
(with the added column I am trying to compute):
Code Min Max (Expected output)
201 2019-1-8 2019-1-10 3
202 2019-1-8 2019-1-11 3
203 2019-1-11 2019-1-12 2
Table activity
:
Date User
2019-1-8 ABCD001
2019-1-8 ABCD002
2019-1-9 ABCD001
2019-1-9 ABCD003
2019-1-10 ABCD001
2019-1-11 ABCD002
2019-1-12 ABCD003
I want as output the columns of tab
, extended with the count of distinct users activity
within that date range
So I need to join on the date range between min
and max
somehow.
Trial 1:
select code, min, max, count(b.distinct user)
from tab a
left join activity b on b.date between a.min and a.max
group by 1,2,3
Trial 2:
select code, min, max, count(b.distinct user)
from tab a
left join activity b on b.date <= a.min and b.date >=a.max and b.date = a.min
group by 1,2,3
I ran different versions of the above code, but it either takes ages to run or fails.
Upvotes: 1
Views: 1026
Reputation: 656734
I suggest a LEFT JOIN LATERAL
:
SELECT a.code, a.min, a.max, b.ct
FROM tab a
LEFT JOIN LATERAL (
SELECT count(DISTINCT b."user") AS ct
FROM activity b
WHERE b.date BETWEEN a.min AND a.max
) b ON true;
A query like this can be expensive if ranges are not trivially small. Each row on the left involves a range of rows on the right. A DISTINCT
count is expensive on top of it, for big ranges. But this should be the fastest query possible.
Support it with an index on activity(date, "user")
. Only add the 2nd index column if you get index-only scans out of it.
user
is a reserved word, btw. Don't use it as identifier.
And I wouldn't use date
, min
or max
either. Names of basic data types and functions as identifiers can lead to confusing errors.
Related:
Upvotes: 1