Reputation: 688
I have the following schema representing a simple metrics storage:
CREATE TABLE targets (
target varchar
);
CREATE TABLE reads (
at timestamp without time zone,
target varchar
);
CREATE TABLE updates (
at timestamp without time zone,
target varchar
);
The relations reads
and updates
store the occurrence of an event on a specific target at a given time.
These are same sample data:
COPY targets (target) FROM stdin;
A
B
C
\.
COPY reads (at, target) FROM stdin;
1970-01-01 03:40:00 A
1970-01-01 06:00:00 B
1970-01-01 05:00:00 A
1970-01-03 05:00:00 A
1970-01-04 01:00:00 B
\.
COPY updates (at, target) FROM stdin;
1970-01-01 01:00:00 A
1970-01-01 01:00:00 B
1970-01-01 02:00:00 A
1970-01-01 04:00:00 A
1970-01-02 01:00:00 A
1970-01-02 01:00:00 B
1970-01-04 01:00:00 B
\.
I would get a report with all metrics counting the occurrences by date for each target, similar to the following query (eventually also without "zero" rows) but in a more efficient way:
select t.target, day::date,
coalesce((select count(*) from updates where target = t.target and at::date = day), 0) updates,
coalesce((select count(*) from reads where target = t.target and at::date = day), 0) reads
from
generate_series('1970-01-01'::date, '1970-01-04'::date, '1 day'::interval) day,
targets t
order by target, day;
target | day | updates | reads
--------+------------+---------+-------
A | 1970-01-01 | 3 | 2
A | 1970-01-02 | 1 | 0
A | 1970-01-03 | 0 | 1
A | 1970-01-04 | 0 | 0
B | 1970-01-01 | 1 | 1
B | 1970-01-02 | 1 | 0
B | 1970-01-03 | 0 | 0
B | 1970-01-04 | 1 | 1
C | 1970-01-01 | 0 | 0
C | 1970-01-02 | 0 | 0
C | 1970-01-03 | 0 | 0
C | 1970-01-04 | 0 | 0
Any suggestions?
Upvotes: 1
Views: 87
Reputation: 32306
You can solve this with a FULL JOIN
on sub-queries doing the counting:
SELECT target, day, updates, reads
FROM (
SELECT target, at::date AS day, count(*) AS updates FROM updates GROUP BY 1, 2
) num_updates
FULL JOIN (
SELECT target, at::date AS day, count(*) AS reads FROM reads GROUP BY 1, 2
) num_reads USING (target, day)
WHERE day BETWEEN '1970-01-01'::date AND '1970-01-04'::date
ORDER BY 1, 2;
This will not produce any rows with 0 values for both updates
and reads
and NULL
s instead of 0
's:
target | day | updates | reads
--------+------------+---------+-------
A | 1970-01-01 | 3 | 2
A | 1970-01-02 | 1 |
A | 1970-01-03 | | 1
B | 1970-01-01 | 1 | 1
B | 1970-01-02 | 1 |
B | 1970-01-04 | 1 | 1
If you do want 0
's but not rows with both updates = 0 AND reads = 0
then do a simple coalesce()
on both columns in the select list:
SELECT target, day, coalesce(updates, 0) AS updates, coalesce(reads, 0) AS reads
...
If you want double NULL
's or 0
's too, then you should generate_series()
the date range, JOIN targets
unqualified for a full Cartesian product, and then LEFT JOIN
the sub-queries to it:
SELECT target, day, updates, reads
FROM generate_series('1970-01-01'::date, '1970-01-04'::date, interval '1 day') d(day)
JOIN targets
LEFT JOIN (
SELECT target, at::date AS day, count(*) AS updates FROM updates GROUP BY 1, 2
) num_updates USING (target, day)
LEFT JOIN (
SELECT target, at::date AS day, count(*) AS reads FROM reads GROUP BY 1, 2
) num_reads USING (target, day)
ORDER BY 1, 2;
Upvotes: 1