Reputation: 1722
I would like to count the waves
available (LEFT OUT JOIN
). My problem is that I also have another LEFT OUTER JOIN
on invitations
.
The idea is to have all the stats from the same SQL request. Stats from invitations
and stats from waves
.
An event has many waves An event has many invitations
CREATE TABLE public.events (
id bigint NOT NULL,
uuid uuid DEFAULT public.gen_random_uuid() NOT NULL
name character varying NOT NULL
);
CREATE TABLE public.invitations (
id bigint NOT NULL,
uuid uuid DEFAULT public.gen_random_uuid() NOT NULL,
event_id bigint NOT NULL
);
CREATE TABLE public.invitation_waves (
id bigint NOT NULL,
uuid uuid DEFAULT public.gen_random_uuid() NOT NULL,
name character varying NOT NULL,
wavable_type character varying,
wavable_id bigint,
scheduled_at timestamp(6) without time zone
);
I have tested 2 options
SELECT CAST(sum((case when (waves.id IS NOT NULL) then 1 else 0 end)) AS INTEGER) as total_waves
FROM "events"
LEFT OUTER JOIN "waves" ON "waves"."wavable_type" = 'Event' AND
"waves"."wavable_id" = "events"."id"
LEFT OUTER JOIN "invitations" ON "invitations"."event_id" = "events"."id"
WHERE "events"."uuid" = 'XXX'
GROUP BY "events"."id"
-> It gives me the number of invitations created if above the number of waves
DISTINCT
)SELECT CAST(sum(DISTINCT(case when (waves.id IS NOT NULL) then 1 else 0 end)) AS INTEGER) as total_waves
FROM "events"
LEFT OUTER JOIN "waves" ON "waves"."wavable_type" = 'Event' AND
"waves"."wavable_id" = "events"."id"
LEFT OUTER JOIN "invitations" ON "invitations"."event_id" = "events"."id"
WHERE "events"."uuid" = 'XXX'
GROUP BY "events"."id"
-> Always returns me 1
which is, I suppose, the number of events matching the request (I've scoped it with the uuid).
Not having the DISTINCT
is actually multiplying the correct sum by the number of waves.
Having the DISTINCT
restrict the entire sum to the number of waves.
I'm currently trying this option based on this answer.
It is working but I'm concerned about the dependency between GROUP BY
and the selected columns from the LEFT OUTER JOIN
.
SELECT CAST(sum((case
when (invitations.status = 4 or invitations.status = 5 or invitations.status = 6) then 1
else 0 end)) AS INTEGER) as total_validated,
total_waves,
waves_scheduled
FROM "events"
LEFT OUTER JOIN "invitations" ON "invitations"."event_id" = "events"."id"
LEFT OUTER JOIN (SELECT distinct on (wavable_id)
wavable_id as event_id,
CAST(sum((case when (invitation_waves.id IS NOT NULL) then 1 else 0 end)) AS INTEGER) as total_waves,
CAST(sum((case when (invitation_waves.scheduled_at IS NOT NULL) then 1 else 0 end)) AS INTEGER) as waves_scheduled
FROM invitation_waves
WHERE wavable_type = 'Event'
GROUP BY 1
ORDER BY wavable_id) as "wave_stats" on "wave_stats"."event_id" = "events"."id"
WHERE "events"."uuid" = '1ee5ec72-6f3c-404c-871c-5c5724f6a1ed'
GROUP BY "events"."id", total_waves, waves_scheduled
Thanks to @Thorsten, I pick an hybrid version between WITH
and direct LEFT JOIN
.
Invitation table was too big (and also joined to another to use a WITH
and FILTER
I suppose and I needed to GROUP BY
every generated column from the with
.
So here is my final version
WITH "wave_stats" AS (SELECT wavable_id as event_id,
COUNT(DISTINCT id) AS total_waves,
COUNT(DISTINCT id) FILTER (where scheduled_at IS NOT NULL) AS waves_scheduled
FROM "waves"
WHERE "waves"."wavable_type" = 'Event'
GROUP BY "waves"."wavable_id")
SELECT events.*,
CAST(COUNT(invitations.id) AS INTEGER) AS total_invitations, CAST (sum((case
when (invitations.status = 4 or invitations.status = 5 or invitations.status = 6) then 1
else 0 end)) AS INTEGER) as total_validated, case when wave_stats.waves_scheduled is null then 0 else wave_stats.waves_scheduled
end,
case when wave_stats.total_waves is null then 0 else wave_stats.total_waves
end FROM "events"
LEFT OUTER JOIN "invitations" ON "invitations"."event_id" = "events"."id"
LEFT JOIN wave_stats ON wave_stats.event_id = events.id
GROUP BY "events"."id", "total_waves", "waves_scheduled"
Upvotes: 0
Views: 124
Reputation: 95082
You want to count invitations and waves per event. For this to happen you join the invitations and waves to an event, but this gives you a cartesian product, i.e. for an event with, say, three invitations and four waves you'll produce 3 x 4 = 12 rows. Then you try to muddle through somehow to get the counts you want.
Instead of joining the invitations and waves to an event, join the invitation count and the wave count to an event. In other words: aggregate before joining.
WITH
event_invitations AS
(
SELECT
event_id,
COUNT(*) AS total
FROM invitations
GROUP BY event_id
),
event_waves AS
(
SELECT
wavable_id AS event_id,
COUNT(*) AS total,
COUNT(scheduled_at) AS scheduled
FROM waves
WHERE wavable_type = 'Event'
GROUP BY wavable_id
)
SELECT
e.*,
COALESCE(i.total, 0) AS total_invitation_count,
COALESCE(w.total, 0) AS total_wave_count,
COALESCE(w.scheduled, 0) AS scheduled_wave_count
FROM events e
LEFT OUTER JOIN event_invitations i ON i.event_id = e.id
LEFT OUTER JOIN event_waves w ON w.event_id = e.id
ORDER BY e.id;
Upvotes: 1
Reputation: 3180
The following SQL creates the tables and populates them with test values to verify that the query is producing the intended results. The event names are the expected counts for invitations, validated invitations, waves, and scheduled waves.
CREATE TABLE events (
id BIGINT NOT NULL PRIMARY KEY,
UUID UUID DEFAULT gen_random_uuid () NOT NULL,
name CHARACTER VARYING NOT NULL
);
CREATE TABLE invitations (
id BIGINT NOT NULL PRIMARY KEY,
UUID UUID DEFAULT gen_random_uuid () NOT NULL,
event_id BIGINT NOT NULL REFERENCES events (id),
status INTEGER
);
CREATE TABLE invitation_waves (
id BIGINT NOT NULL PRIMARY KEY,
UUID UUID DEFAULT gen_random_uuid () NOT NULL,
name CHARACTER VARYING NOT NULL,
wavable_type CHARACTER VARYING,
wavable_id BIGINT,
scheduled_at TIMESTAMP(6) WITHOUT TIME ZONE
);
INSERT INTO
events (id, name)
VALUES
(1, '0 0 0 0'),
(2, '1 0 0 0'),
(3, '2 1 0 0'),
(4, '0 0 1 0'),
(5, '0 0 2 1'),
(6, '1 0 1 0'),
(7, '2 1 1 0'),
(8, '1 0 2 1'),
(9, '2 1 2 1');
INSERT INTO
invitations (id, event_id, status)
VALUES
(1, 2, 0),
(2, 3, 0),
(3, 3, 4),
(4, 6, 0),
(5, 7, 0),
(6, 7, 5),
(7, 8, 0),
(8, 9, 0),
(9, 9, 6);
INSERT INTO
invitation_waves (id, name, wavable_type, wavable_id, scheduled_at)
VALUES
(1, 'wave1', 'Event', 4, NULL),
(2, 'wave2', 'Event', 5, NULL),
(3, 'wave3', 'Event', 5, '2024-01-01 00:00:00'::TIMESTAMP),
(4, 'wave4', 'Event', 6, NULL),
(5, 'wave5', 'Event', 7, NULL),
(6, 'wave6', 'Event', 8, NULL),
(7, 'wave7', 'Event', 8, '2024-01-01 00:00:00'::TIMESTAMP),
(8, 'wave8', 'Event', 9, NULL),
(9, 'wave9', 'Event', 9, '2024-01-01 00:00:00'::TIMESTAMP);
The following demonstrates using COUNT(DISTINCT ...) FILTER (WHERE ...)
to report counts for multiple conditions within a single query. Without DISTINCT
, the query would report incorrect counts for events that have both invitations and waves when either has more than one associated with the event.
SELECT
e.id AS event_id,
e.name,
COUNT(DISTINCT i.id) AS total_invitations,
COUNT(DISTINCT i.id) FILTER (WHERE i.status IN (4, 5, 6)) AS total_validated,
COUNT(DISTINCT w.id) AS total_waves,
COUNT(DISTINCT w.id) FILTER (WHERE w.scheduled_at IS NOT NULL) AS scheduled_waves
FROM
events e
LEFT JOIN invitations i
ON i.event_id = e.id
LEFT JOIN invitation_waves w
ON w.wavable_type = 'Event' AND
w.wavable_id = e.id
GROUP BY
e.id
ORDER BY
e.id;
Running the query with the test data produces the following results:
event_id | name | total_invitations | total_validated | total_waves | scheduled_waves |
---|---|---|---|---|---|
1 | 0 0 0 0 | 0 | 0 | 0 | 0 |
2 | 1 0 0 0 | 1 | 0 | 0 | 0 |
3 | 2 1 0 0 | 2 | 1 | 0 | 0 |
4 | 0 0 1 0 | 0 | 0 | 1 | 0 |
5 | 0 0 2 1 | 0 | 0 | 2 | 1 |
6 | 1 0 1 0 | 1 | 0 | 1 | 0 |
7 | 2 1 1 0 | 2 | 1 | 1 | 0 |
8 | 1 0 2 1 | 1 | 0 | 2 | 1 |
9 | 2 1 2 1 | 2 | 1 | 2 | 1 |
Upvotes: -1