Reputation: 192
I want to return results matching a key
where all parts
are present. Given:
create table things (
id int not null,
key int not null,
part character varying(1),
details character varying(64),
CONSTRAINT things_pkey PRIMARY KEY (id)
);
And this:
id | key | part | details |
---|---|---|---|
1 | 1 | a | 1a details |
2 | 1 | b | 1b details |
3 | 1 | c | 1c details |
4 | 2 | a | 2a details |
5 | 2 | b | 2b details |
6 | 2 | c | 2c details |
7 | 3 | a | 3a details |
8 | 3 | c | 3c details |
9 | 4 | b | 4b details |
10 | 5 | b | 5b details |
11 | 6 | b | 6b details |
12 | 6 | c | 6c details |
13 | 7 | a | 7a details |
14 | 8 | a | 8a details |
I can get to this:
id | key | part | details |
---|---|---|---|
1 | 1 | a | 1a details |
2 | 1 | b | 1b details |
3 | 1 | c | 1c details |
4 | 2 | a | 2a details |
5 | 2 | b | 2b details |
6 | 2 | c | 2c details |
With this query:
select *
from things t
where t.key in (
select x.key
from things x
group by x.key
having count(distinct part) = 3
);
But I really want to match the distinct part, not just the count of it, e.g. having distinct part = ['a', 'b', 'c']
. Can I do this in the query or just do it in the application code?
http://sqlfiddle.com/#!17/38b399/6
Essentially, what I'm after is a chunk of rows where all the part
s for a thing
are present. There are eight parts to a thing. They'll be processed and the records in this table deleted. Repeat forever.
Here is the CREATE
script from pgAdmin (with reduced noise):
CREATE TABLE public.things (
id uuid PRIMARY KEY,
key character varying(255) COLLATE pg_catalog."default" NOT NULL,
part character varying(3) COLLATE pg_catalog."default" NOT NULL,
details character varying(1024) COLLATE pg_catalog."default",
timezone character varying(128) COLLATE pg_catalog."default",
client_id uuid,
CONSTRAINT things_client_id_fkey FOREIGN KEY (client_id)
REFERENCES public.clients (id)
);
CREATE INDEX things_client_id_index ON public.things (client_id);
CREATE UNIQUE INDEX unique_things ON public.things (key, part, client_id);
Upvotes: 0
Views: 2209
Reputation: 656872
Basically this can be cast as a case of relational-division.
Queries checking the distinct count of parts per key have to process all rows of the table. Plus, a distinct count is expensive on top of that. Aggregating and comparing arrays is even more expensive.
If most rows qualify, that does not make a big difference as the whole table will be processed anyway. For a small selection, any such approach performs terribly. Alternative query techniques that can use indexes will shine in comparison.
Ideally, you have a separate table of keys with one row per relevant key. Then use something like this:
SELECT *
FROM keys k
WHERE EXISTS (SELECT FROM things WHERE key = k.key AND part = 'a')
AND EXISTS (SELECT FROM things WHERE key = k.key AND part = 'b')
AND EXISTS (SELECT FROM things WHERE key = k.key AND part = 'c');
Table things
needs a multicolumn index on (part, key)
to make it fast.
Even if you don't have a keys
table:
SELECT t1.key
FROM things t1
JOIN things t2 USING (key)
JOIN things t3 USING (key)
WHERE t1.part = 'a'
AND t2.part = 'b'
AND t3.part = 'c';
db<>fiddle here
The best query depends on your exact requirements for filters and result format, and the exact schema definition.
Related:
Upvotes: 2
Reputation: 1269913
I think count(distinct)
does what you want. If you wanted to explicitly specify the parts, you could use:
where t.key in (
select x.key
from things x
group by x.key
having array_agg(distinct part order by part)::text[] = array['a', 'b', 'c']
);
Upvotes: 0