cmo
cmo

Reputation: 192

SELECT rows HAVING result equal to a list of values from DISTINCT

I want to return results matching a key where all parts are present. Given:

create table things (
  id int not null,
  key int not null,
  part character varying(1),
  details character varying(64),
  CONSTRAINT things_pkey PRIMARY KEY (id)
  );

And this:

id key part details
1 1 a 1a details
2 1 b 1b details
3 1 c 1c details
4 2 a 2a details
5 2 b 2b details
6 2 c 2c details
7 3 a 3a details
8 3 c 3c details
9 4 b 4b details
10 5 b 5b details
11 6 b 6b details
12 6 c 6c details
13 7 a 7a details
14 8 a 8a details

I can get to this:

id key part details
1 1 a 1a details
2 1 b 1b details
3 1 c 1c details
4 2 a 2a details
5 2 b 2b details
6 2 c 2c details

With this query:

select * 
from things t
where t.key in (
  select x.key
  from things x
  group by x.key
  having count(distinct part) = 3
);

But I really want to match the distinct part, not just the count of it, e.g. having distinct part = ['a', 'b', 'c']. Can I do this in the query or just do it in the application code?

http://sqlfiddle.com/#!17/38b399/6

Edit

Essentially, what I'm after is a chunk of rows where all the parts for a thing are present. There are eight parts to a thing. They'll be processed and the records in this table deleted. Repeat forever.

Here is the CREATE script from pgAdmin (with reduced noise):

CREATE TABLE public.things (
    id uuid PRIMARY KEY,
    key character varying(255) COLLATE pg_catalog."default" NOT NULL,
    part character varying(3) COLLATE pg_catalog."default" NOT NULL,
    details character varying(1024) COLLATE pg_catalog."default",
    timezone character varying(128) COLLATE pg_catalog."default",
    client_id uuid,
    CONSTRAINT things_client_id_fkey FOREIGN KEY (client_id)
        REFERENCES public.clients (id)

);

CREATE INDEX things_client_id_index ON public.things (client_id);
CREATE UNIQUE INDEX unique_things ON public.things (key, part, client_id);

Upvotes: 0

Views: 2209

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656872

Basically this can be cast as a case of .

Queries checking the distinct count of parts per key have to process all rows of the table. Plus, a distinct count is expensive on top of that. Aggregating and comparing arrays is even more expensive.

If most rows qualify, that does not make a big difference as the whole table will be processed anyway. For a small selection, any such approach performs terribly. Alternative query techniques that can use indexes will shine in comparison.

Ideally, you have a separate table of keys with one row per relevant key. Then use something like this:

SELECT *
FROM   keys k
WHERE  EXISTS (SELECT FROM things WHERE key = k.key AND part = 'a')
AND    EXISTS (SELECT FROM things WHERE key = k.key AND part = 'b')
AND    EXISTS (SELECT FROM things WHERE key = k.key AND part = 'c');

Table things needs a multicolumn index on (part, key) to make it fast.

Even if you don't have a keys table:

SELECT t1.key
FROM   things t1
JOIN   things t2 USING (key)
JOIN   things t3 USING (key)
WHERE  t1.part = 'a'
AND    t2.part = 'b'
AND    t3.part = 'c';

db<>fiddle here

The best query depends on your exact requirements for filters and result format, and the exact schema definition.

Related:

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269913

I think count(distinct) does what you want. If you wanted to explicitly specify the parts, you could use:

where t.key in (
  select x.key
  from things x
  group by x.key
  having array_agg(distinct part order by part)::text[] = array['a', 'b', 'c']
);

Upvotes: 0

Related Questions