Reputation: 63
I am trying to get the count of certain types of records in a related table. I am using a left join.
I have a query that isn't right and one that is. The correct results query has a higher execution cost. I'd like to use the first approach, if I can correct the results.
See http://sqlfiddle.com/#!15/7c20b/5/2
.
CREATE TABLE people(
id SERIAL,
name varchar not null
);
CREATE TABLE pets(
id SERIAL,
name varchar not null,
kind varchar not null,
alive boolean not null default false,
person_id integer not null
);
INSERT INTO people(name) VALUES
('Chad'),
('Buck'); --can't keep pets alive
INSERT INTO pets(name, alive, kind, person_id) VALUES
('doggio', true, 'dog', 1),
('dog master flash', true, 'dog', 1),
('catio', true, 'cat', 1),
('lucky', false, 'cat', 2);
My goal is to get a table back with ALL of the people and the counts of the KINDS of pets they have alive:
| ID | ALIVE_DOGS_COUNT | ALIVE_CATS_COUNT |
|----|------------------|------------------|
| 1 | 2 | 1 |
| 2 | 0 | 0 |
In our production app (not really pets) there would be about 100,000 dead dogs and cats per person. I was hoping to filter all the 'dead' stuff out before the count. I have the slower query in production.
-- Returns incorrect results. When a person has an alive pet, the records are correct, but excludes people that dont currently have
-- an alive pet, fastest of three...
SELECT
people.id,
COALESCE(SUM(case when pets.kind='dog' then 1 else 0 end),0) as alive_dogs_count,
COALESCE(SUM(case when pets.kind='cat' then 1 else 0 end),0) as alive_cats_count
FROM
people
LEFT JOIN pets on people.id = pets.person_id
WHERE
pets.alive = true
GROUP BY people.id;
-- Returns correct results, but looks like the execution plan costs a little more.
SELECT people.id,
(SELECT COUNT(pets.id) FROM pets WHERE pets.person_id = people.id AND pets.alive = true AND pets.kind = 'dog') as alive_dogs_count,
(SELECT COUNT(pets.id) FROM pets WHERE pets.person_id = people.id AND pets.alive = true AND pets.kind = 'cat') as alive_cats_count
FROM people;
-- Returns correct results, but looks like the execution plan costs significantly more.
SELECT
people.id,
COALESCE(SUM(case when pets.alive = true AND pets.kind='dog' then 1 else 0 end),0) as alive_dogs_count,
COALESCE(SUM(case when pets.alive = true AND pets.kind='cat' then 1 else 0 end),0) as alive_cats_count
FROM
people
LEFT JOIN pets on people.id = pets.person_id
GROUP BY people.id;
How do I get the LEFT JOIN version working?
Upvotes: 4
Views: 10410
Reputation: 656706
Typically fastest:
SELECT pp.id
, COALESCE(pt.a_dog_ct, 0) AS alive_dogs_count
, COALESCE(pt.a_cat_ct, 0) AS alive_cats_count
FROM people pp
LEFT JOIN (
SELECT person_id
, count(*) FILTER (WHERE kind = 'dog') AS a_dog_ct
, count(*) FILTER (WHERE kind = 'cat') AS a_cat_ct
FROM pets
WHERE alive
GROUP BY 1
) pt ON pt.person_id = pp.id;
About the aggregate FILTER
clause:
Indexes are irrelevant here, full table scans will be fastest. Except if alive
is a rare case, then a partial index should help. Like:
CREATE INDEX pets_alive_idx ON pets (person_id, kind) WHERE alive;
I included all columns needed for the query (person_id, kind)
to allow index-only scans.
Typically fastest:
SELECT pp.id
, count(kind = 'dog' OR NULL) AS alive_dogs_count
, count(kind = 'cat' OR NULL) AS alive_cats_count
FROM people pp
LEFT JOIN pets pt ON pt.person_id = pp.id
AND pt.alive
WHERE <some condition to retrieve a small subset>
GROUP BY 1;
You should at least have an index on pets(person_id)
for this (or the partial index from above) - and possibly more, depending on the WHERE
condition.
Related:
Upvotes: 2
Reputation: 10900
JOIN with SUM
I think your original query was something like this:
SELECT people.id, stats.dog, stats.cat
FROM people
JOIN (SELECT person_id, count(kind)filter(where kind='dog') dog, count(kind)filter(where kind='cat') cat FROM pets WHERE alive GROUP BY person_id) stats
ON stats.person_id = people.id
That works smoothly, but you should understand, that the result will miss the people with 0 pets, because of inner join. In order to include people who miss pets, you can:
See the accepted answer above. Credits to @ErwinBrandstetter
Slowness
In contrast to other DBMS', Postgresql doesn't create indexes for foreign keys. One multicolumn index will be more efficient than three single indexes. Extend the foreign key index with extra columns from WHERE and JOIN ON columns in the right order:
CREATE INDEX people_fk_with_kind_alive ON test2 (person_id, alive, kind);
REF: https://postgresql.org/docs/11/indexes-multicolumn.html Of course, your primary keys should be defined. The primary key will be indexed by default.
Upvotes: 0
Reputation: 77876
Your WHERE alive=true
is actually filtering out record for person_id = 2
. Use the below query, push the WHERE alive=true
condition into the CASE
condition as can be noticed here. See your modified Fiddle
SELECT people.id,
pe.alive_dogs_count,
pe.alive_cats_count
FROM people
LEFT JOIN
(
select person_id,
COALESCE(SUM(case when pets.kind='dog' and alive = true then 1 else 0 end),0) as alive_dogs_count,
COALESCE(SUM(case when pets.kind='cat' and alive = true then 1 else 0 end),0) as alive_cats_count
from pets
GROUP BY person_id
) pe on people.id = pe.person_id
(OR) your version
SELECT
people.id,
COALESCE(SUM(case when pets.kind='dog' and alive = true then 1 else 0 end),0) as alive_dogs_count,
COALESCE(SUM(case when pets.kind='cat' and alive = true then 1 else 0 end),0) as alive_cats_count
FROM people
LEFT JOIN pets on people.id = pets.person_id
GROUP BY people.id;
Upvotes: 0