Reputation: 257
I have a query which looks something like this
SELECT DISTINCT
COALESCE(fa.id, fb.id) AS id,
COALESCE(fa.d_id, fb.d_id) AS d_id,
COALESCE(fa.name, fb.name) AS name,
COALESCE(fa.disabled, fb.disabled) AS disabled,
COALESCE(fa.deleted, fb.deleted) AS deleted
FROM (
SELECT * from table WHERE name LIKE '%'
AND d_id IS NULL AND deleted = false
) fa
FULL JOIN (
SELECT * from table WHERE name LIKE '%'
AND d_id = 1 AND deleted = false
) fb ON fa.name = fb.name
ORDER BY name;
where id
is the primary key of the table and name
is the actual value. d_id
is the id of a user.
Basically, the table has a huge list of names (about 400k+), and if it doesn't have a d_id
, it means it was automatically generated by the system. If it has a d_id
, then it means it was user generated.
What the query is supposed to return is the entire list of default system names plus the names which that certain user has added (in this case, all names generated by the user with a d_id of 1). This is why it performs a full join on itself.
My problem here is that running the query takes too long (about 30000~40000ms on my local psql shell, and ~15000 on live). I've ran EXPLAIN ANALYZE and got this
Unique (cost=8240.78..8272.13 rows=2090 width=42) (actual time=27591.662..28742.062 rows=418018 loops=1)
-> Sort (cost=8240.78..8246.01 rows=2090 width=42) (actual time=27591.659..28504.606 rows=418018 loops=1)
Sort Key: (COALESCE(table.name, table_1.name)), (COALESCE(table.id, table_1.id)), (COALESCE(table.d_id, table_1.d_id)), (COALESCE(table.disabled, table_1.disabled)), (COALESCE(table.deleted, table_1.deleted))
Sort Method: external merge Disk: 13680kB
-> Hash Full Join (cost=8.45..8125.53 rows=2090 width=42) (actual time=11.037..1479.053 rows=418018 loops=1)
Hash Cond: (table.name = table_1.name)
-> Seq Scan on table (cost=0.00..8109.23 rows=2090 width=27) (actual time=0.048..799.822 rows=418018 loops=1)
Filter: ((d_id IS NULL) AND (NOT deleted) AND (name ~~ '%'::citext))
-> Hash (cost=8.44..8.44 rows=1 width=27) (actual time=10.970..10.970 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Index Scan using table__d_id__name__idx on table table_1 (cost=0.42..8.44 rows=1 width=27) (actual time=10.970..10.970 rows=0 loops=1)
Index Cond: (d_id = 1)
Filter: ((NOT deleted) AND (name ~~ '%'::citext))
And while I can't understand it fully, I can tell that most of the reason why it takes too long is in the sort (ORDER BY
) function.
My indexes are as follows:
Indexes:
"table_pkey" PRIMARY KEY, btree (id)
"table__d_id__name__idx" UNIQUE, btree (d_id, name)
"table__name__idx" gist (name gist_trgm_ops)
"table__id__idx" btree (id)
I've tried using different indexes, refactoring the query and playing around with the code, but it still takes just as long. I've tried removing all but the primary key index, and the query somehow sped up to ~23000ms.
Also, in the app, the user can select a letter which will return all results which start with that letter, and the query would look like WHERE name LIKE 'a%'
. Despite also having tens of thousands of results, specifying a starting letter drastically decreases the load time to about 1000-2000ms.
I'm aiming to make the query load in as little as 5000 to 10000ms. Any help or suggestions will be greatly appreciated!
Upvotes: 2
Views: 1918
Reputation: 247950
The large sort is the problem.
You could get rid of the sort if you didn't use DISTINCT
.
I see that in your case the rows are all different anyway, since there are 418018 rows before and after applying Unique
.
Think carefully if duplicates can really happen in your case or if you can do away with DISTINCT
and solve the problem that way.
If you need DISTINCT
, you should increase work_mem
at least for this query, so that the sort can happen in memory rather than spilling to disk. This would improve the performance considerably.
Upvotes: 0
Reputation: 32402
I think you can use an or
instead of a full join
. distinct on (name)
only selects unique names and order by name, d_id
selects system names before user names.
select distinct on (name)
id, d_id, name, disabled, deleted
from table
where deleted = false
and (
d_id is null
or d_id = 1
)
order by name, d_id
Upvotes: 2