Cory
Cory

Reputation: 63

Postgres LEFT JOIN with SUM, missing records

I am trying to get the count of certain types of records in a related table. I am using a left join.

I have a query that isn't right and one that is. The correct results query has a higher execution cost. I'd like to use the first approach, if I can correct the results.

See http://sqlfiddle.com/#!15/7c20b/5/2.

CREATE TABLE people(
  id SERIAL,
  name varchar not null
);

CREATE TABLE pets(
  id SERIAL,
  name varchar not null, 
  kind varchar not null,
  alive boolean not null default false,
  person_id integer not null
);

INSERT INTO people(name) VALUES
('Chad'),
('Buck'); --can't keep pets alive

INSERT INTO pets(name, alive, kind, person_id) VALUES
('doggio', true, 'dog', 1),
('dog master flash', true, 'dog', 1),
('catio', true, 'cat', 1),
('lucky', false, 'cat', 2);

My goal is to get a table back with ALL of the people and the counts of the KINDS of pets they have alive:

| ID | ALIVE_DOGS_COUNT | ALIVE_CATS_COUNT |
|----|------------------|------------------|
|  1 |                2 |                1 |
|  2 |                0 |                0 |
    

In our production app (not really pets) there would be about 100,000 dead dogs and cats per person. I was hoping to filter all the 'dead' stuff out before the count. I have the slower query in production.

-- Returns incorrect results. When a person has an alive pet, the records are correct, but excludes people that dont currently have
-- an alive pet, fastest of three...
SELECT 
  people.id,
  COALESCE(SUM(case when pets.kind='dog' then 1 else 0 end),0) as alive_dogs_count,
  COALESCE(SUM(case when pets.kind='cat' then 1 else 0 end),0) as alive_cats_count
FROM
  people
  LEFT JOIN pets on people.id = pets.person_id
WHERE 
  pets.alive = true
GROUP BY people.id;

-- Returns correct results, but looks like the execution plan costs a little more.
SELECT people.id,
  (SELECT COUNT(pets.id) FROM pets WHERE pets.person_id = people.id AND pets.alive = true AND pets.kind = 'dog') as alive_dogs_count,
  (SELECT COUNT(pets.id) FROM pets WHERE pets.person_id = people.id AND pets.alive = true AND pets.kind = 'cat') as alive_cats_count
FROM people;

-- Returns correct results, but looks like the execution plan costs significantly more.
SELECT 
  people.id,
  COALESCE(SUM(case when pets.alive = true AND pets.kind='dog' then 1 else 0 end),0) as alive_dogs_count,
  COALESCE(SUM(case when pets.alive = true AND pets.kind='cat' then 1 else 0 end),0) as alive_cats_count
FROM
  people
  LEFT JOIN pets on people.id = pets.person_id
GROUP BY people.id;

How do I get the LEFT JOIN version working?

Upvotes: 4

Views: 10410

Answers (3)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656706

While fetching all or most rows

Typically fastest:

SELECT pp.id
     , COALESCE(pt.a_dog_ct, 0) AS alive_dogs_count
     , COALESCE(pt.a_cat_ct, 0) AS alive_cats_count
FROM   people pp
LEFT   JOIN (
   SELECT person_id
        , count(*) FILTER (WHERE kind = 'dog') AS a_dog_ct
        , count(*) FILTER (WHERE kind = 'cat') AS a_cat_ct
   FROM   pets
   WHERE  alive
   GROUP  BY 1
   ) pt ON pt.person_id = pp.id;

About the aggregate FILTER clause:

Indexes are irrelevant here, full table scans will be fastest. Except if alive is a rare case, then a partial index should help. Like:

CREATE INDEX pets_alive_idx ON pets (person_id, kind) WHERE alive;

I included all columns needed for the query (person_id, kind) to allow index-only scans.

fiddle
Old sqlfiddle

While fetching a small subset

Typically fastest:

SELECT pp.id
     , count(kind = 'dog' OR NULL) AS alive_dogs_count
     , count(kind = 'cat' OR NULL) AS alive_cats_count
FROM   people pp
LEFT   JOIN pets pt ON pt.person_id = pp.id
                   AND pt.alive
WHERE  <some condition to retrieve a small subset>
GROUP  BY 1;

You should at least have an index on pets(person_id) for this (or the partial index from above) - and possibly more, depending on the WHERE condition.

Related:

Upvotes: 2

epox
epox

Reputation: 10900

JOIN with SUM

I think your original query was something like this:

SELECT people.id, stats.dog, stats.cat
  FROM people
  JOIN (SELECT person_id, count(kind)filter(where kind='dog') dog, count(kind)filter(where kind='cat') cat FROM pets WHERE alive GROUP BY person_id) stats
    ON stats.person_id = people.id

That works smoothly, but you should understand, that the result will miss the people with 0 pets, because of inner join. In order to include people who miss pets, you can:

  1. firstly LEFT JOIN,
  2. then GROUP BY joined result
  3. and be ready for NULL values instead of counts.

See the accepted answer above. Credits to @ErwinBrandstetter

Slowness

In contrast to other DBMS', Postgresql doesn't create indexes for foreign keys. One multicolumn index will be more efficient than three single indexes. Extend the foreign key index with extra columns from WHERE and JOIN ON columns in the right order:

CREATE INDEX people_fk_with_kind_alive ON test2 (person_id, alive, kind);

REF: https://postgresql.org/docs/11/indexes-multicolumn.html Of course, your primary keys should be defined. The primary key will be indexed by default.

Upvotes: 0

Rahul
Rahul

Reputation: 77876

Your WHERE alive=true is actually filtering out record for person_id = 2. Use the below query, push the WHERE alive=true condition into the CASE condition as can be noticed here. See your modified Fiddle

SELECT people.id,
pe.alive_dogs_count,
pe.alive_cats_count
FROM people
LEFT JOIN 
(
select person_id, 
  COALESCE(SUM(case when pets.kind='dog' and alive = true then 1 else 0 end),0) as alive_dogs_count,
  COALESCE(SUM(case when pets.kind='cat' and alive = true then 1 else 0 end),0) as alive_cats_count
from pets
GROUP BY person_id
) pe on people.id = pe.person_id

(OR) your version

SELECT 
  people.id,
  COALESCE(SUM(case when pets.kind='dog' and alive = true then 1 else 0 end),0) as alive_dogs_count,
  COALESCE(SUM(case when pets.kind='cat' and alive = true then 1 else 0 end),0) as alive_cats_count
FROM people
  LEFT JOIN pets on people.id = pets.person_id
GROUP BY people.id;

Upvotes: 0

Related Questions