maurocchi
maurocchi

Reputation: 688

Querying a metrics schema on PostgreSQL

I have the following schema representing a simple metrics storage:

CREATE TABLE targets (
    target varchar
);

CREATE TABLE reads (
    at timestamp without time zone,
    target varchar
);

CREATE TABLE updates (
    at timestamp without time zone,
    target varchar
);

The relations reads and updates store the occurrence of an event on a specific target at a given time.

These are same sample data:

COPY targets (target) FROM stdin;
A
B
C
\.

COPY reads (at, target) FROM stdin;
1970-01-01 03:40:00 A
1970-01-01 06:00:00 B
1970-01-01 05:00:00 A
1970-01-03 05:00:00 A
1970-01-04 01:00:00 B
\.

COPY updates (at, target) FROM stdin;
1970-01-01 01:00:00 A
1970-01-01 01:00:00 B
1970-01-01 02:00:00 A
1970-01-01 04:00:00 A
1970-01-02 01:00:00 A
1970-01-02 01:00:00 B
1970-01-04 01:00:00 B
\.

I would get a report with all metrics counting the occurrences by date for each target, similar to the following query (eventually also without "zero" rows) but in a more efficient way:

select t.target, day::date,
    coalesce((select count(*) from updates where target = t.target and at::date = day), 0) updates,
    coalesce((select count(*) from reads   where target = t.target and at::date = day), 0) reads
from 
    generate_series('1970-01-01'::date, '1970-01-04'::date, '1 day'::interval) day,
    targets t
order by target, day;

 target |    day     | updates | reads 
--------+------------+---------+-------
 A      | 1970-01-01 |       3 |     2
 A      | 1970-01-02 |       1 |     0
 A      | 1970-01-03 |       0 |     1
 A      | 1970-01-04 |       0 |     0
 B      | 1970-01-01 |       1 |     1
 B      | 1970-01-02 |       1 |     0
 B      | 1970-01-03 |       0 |     0
 B      | 1970-01-04 |       1 |     1
 C      | 1970-01-01 |       0 |     0
 C      | 1970-01-02 |       0 |     0
 C      | 1970-01-03 |       0 |     0
 C      | 1970-01-04 |       0 |     0

Any suggestions?

Upvotes: 1

Views: 87

Answers (1)

Patrick
Patrick

Reputation: 32306

You can solve this with a FULL JOIN on sub-queries doing the counting:

SELECT target, day, updates, reads
FROM (
    SELECT target, at::date AS day, count(*) AS updates FROM updates GROUP BY 1, 2
  ) num_updates
FULL JOIN (
    SELECT target, at::date AS day, count(*) AS reads FROM reads GROUP BY 1, 2
  ) num_reads USING (target, day)
WHERE day BETWEEN '1970-01-01'::date AND '1970-01-04'::date
ORDER BY 1, 2;

This will not produce any rows with 0 values for both updates and reads and NULLs instead of 0's:

 target |    day     | updates | reads 
--------+------------+---------+-------
 A      | 1970-01-01 |       3 |     2
 A      | 1970-01-02 |       1 |     
 A      | 1970-01-03 |         |     1
 B      | 1970-01-01 |       1 |     1
 B      | 1970-01-02 |       1 |      
 B      | 1970-01-04 |       1 |     1

If you do want 0's but not rows with both updates = 0 AND reads = 0 then do a simple coalesce() on both columns in the select list:

SELECT target, day, coalesce(updates, 0) AS updates, coalesce(reads, 0) AS reads
...

If you want double NULL's or 0's too, then you should generate_series() the date range, JOIN targets unqualified for a full Cartesian product, and then LEFT JOIN the sub-queries to it:

SELECT target, day, updates, reads
FROM generate_series('1970-01-01'::date, '1970-01-04'::date, interval '1 day') d(day)
JOIN targets
LEFT JOIN (
    SELECT target, at::date AS day, count(*) AS updates FROM updates GROUP BY 1, 2
  ) num_updates USING (target, day)
LEFT JOIN (
    SELECT target, at::date AS day, count(*) AS reads FROM reads GROUP BY 1, 2
  ) num_reads USING (target, day)
ORDER BY 1, 2;

Upvotes: 1

Related Questions