SELECT rows HAVING result equal to a list of values from DISTINCT

Question

I want to return results matching a key where all parts are present. Given:

create table things (
  id int not null,
  key int not null,
  part character varying(1),
  details character varying(64),
  CONSTRAINT things_pkey PRIMARY KEY (id)
  );

And this:

id	key	part	details
1	1	a	1a details
2	1	b	1b details
3	1	c	1c details
4	2	a	2a details
5	2	b	2b details
6	2	c	2c details
7	3	a	3a details
8	3	c	3c details
9	4	b	4b details
10	5	b	5b details
11	6	b	6b details
12	6	c	6c details
13	7	a	7a details
14	8	a	8a details

I can get to this:

id	key	part	details
1	1	a	1a details
2	1	b	1b details
3	1	c	1c details
4	2	a	2a details
5	2	b	2b details
6	2	c	2c details

With this query:

select * 
from things t
where t.key in (
  select x.key
  from things x
  group by x.key
  having count(distinct part) = 3
);

But I really want to match the distinct part, not just the count of it, e.g. having distinct part = ['a', 'b', 'c']. Can I do this in the query or just do it in the application code?

http://sqlfiddle.com/#!17/38b399/6

Edit

Essentially, what I'm after is a chunk of rows where all the parts for a thing are present. There are eight parts to a thing. They'll be processed and the records in this table deleted. Repeat forever.

Here is the CREATE script from pgAdmin (with reduced noise):

CREATE TABLE public.things (
    id uuid PRIMARY KEY,
    key character varying(255) COLLATE pg_catalog."default" NOT NULL,
    part character varying(3) COLLATE pg_catalog."default" NOT NULL,
    details character varying(1024) COLLATE pg_catalog."default",
    timezone character varying(128) COLLATE pg_catalog."default",
    client_id uuid,
    CONSTRAINT things_client_id_fkey FOREIGN KEY (client_id)
        REFERENCES public.clients (id)

);

CREATE INDEX things_client_id_index ON public.things (client_id);
CREATE UNIQUE INDEX unique_things ON public.things (key, part, client_id);

Erwin Brandstetter · Accepted Answer

Basically this can be cast as a case of relational-division.

Queries checking the distinct count of parts per key have to process all rows of the table. Plus, a distinct count is expensive on top of that. Aggregating and comparing arrays is even more expensive.

If most rows qualify, that does not make a big difference as the whole table will be processed anyway. For a small selection, any such approach performs terribly. Alternative query techniques that can use indexes will shine in comparison.

Ideally, you have a separate table of keys with one row per relevant key. Then use something like this:

SELECT *
FROM   keys k
WHERE  EXISTS (SELECT FROM things WHERE key = k.key AND part = 'a')
AND    EXISTS (SELECT FROM things WHERE key = k.key AND part = 'b')
AND    EXISTS (SELECT FROM things WHERE key = k.key AND part = 'c');

Table things needs a multicolumn index on (part, key) to make it fast.

Even if you don't have a keys table:

SELECT t1.key
FROM   things t1
JOIN   things t2 USING (key)
JOIN   things t3 USING (key)
WHERE  t1.part = 'a'
AND    t2.part = 'b'
AND    t3.part = 'c';

db<>fiddle here

The best query depends on your exact requirements for filters and result format, and the exact schema definition.

How to filter SQL results in a has-many-through relation

SELECT rows HAVING result equal to a list of values from DISTINCT

Edit

Answers (2)

Related Questions