Reputation: 557
I have a table with about 10 million entries, which I'm trying to optimize.
create table houses
(
id serial not null
constraint houses_pkey
primary key,
secondary_id text not null,
market integer not null,
user_id uuid not null,
status text default ‘’::text not null,
custom boolean default false,
constraint houses_unique_constraint
unique (user_id, market, secondary_id)
);
create index houses_user_index
on houses (user_id);
create index houses_user_market_index
on houses (user_id, market);
create index houses_user_status_index
on houses (user_id, status);
I have an use case, where I want to find all distinct non-null user_id and market combinations with given statuses and if any of the entries have their custom flag set. I'm using the following query, but it's very slow. Do you have any ideas what I could optimize here? Thank you!
postgres=# EXPLAIN ANALYZE VERBOSE SELECT DISTINCT user_id, market, bool_or(custom)
FROM houses WHERE user_id IS NOT NULL
AND status=ANY(‘{open, sold}‘) GROUP by user_id, market;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=1694157.78..1695700.38 rows=154260 width=21) (actual time=9574.290..9704.120 rows=809916 loops=1)
Output: user_id, market, (bool_or(custom))
-> Sort (cost=1694157.78..1694543.43 rows=154260 width=21) (actual time=9574.289..9625.108 rows=809916 loops=1)
Output: user_id, market, (bool_or(custom))
Sort Key: houses.user_id, houses.market, (bool_or(houses.custom))
Sort Method: external sort Disk: 24544kB
-> GroupAggregate (cost=0.56..1677700.42 rows=154260 width=21) (actual time=0.396..9290.278 rows=809916 loops=1)
Output: user_id, market, bool_or(custom)
Group Key: houses.user_id, houses.market
-> Index Scan using houses_user_market_index on public.houses (cost=0.56..1615726.52 rows=8057507 width=21) (actual time=0.350..8647.480 rows=8114889 loops=1)
Output: user_id, market, custom
Index Cond: (houses.user_id IS NOT NULL)
Filter: (houses.status = ANY (‘{open,sold}’::text[]))
Rows Removed by Filter: 892609
Planning time: 0.889 ms
Execution time: 9729.300 ms
(16 rows)
I have tried adding more indices to cover the custom
field as well, but it doesn't seem to make any difference.
Upvotes: 0
Views: 970
Reputation: 44157
No matter what, you are summarizing over 8 million rows. You might be able to improve things but don't expect any magic.
First thing to do is to drop the DISTINCT, as the GROUP BY already renders that choice of columns distinct already (though the planner does not seem to know that). But it looks like that will only save 0.5 seconds.
In your existing plan, the index does not provide any usable selectivity. What it does offer is production of the data in an order which suits the GroupAggregate. But it still has to hop all round the table to pull out the additional columns, and I am surprised it finds this an attractive option. Perhaps that is because the table data is highly correlated on user_id, so doing this is mostly visiting the table pages in physical order.
Even if that is the case, it would be better to do an index-only scan, which you can get it to do by having a covering index, which would be on (user_id, market, status, custom)
. You don't need the INCLUDE feature to have a covering index, so being on v10 is not a problem--you just have to put the columns into the body of the index. It has been recommended to put status earlier in the index, but doing that will wreck the ordering property without providing any meaningful selectivity benefit.
You might get some benefit from parallel execution (but in my case, it is actually a harm not a benefit--maybe due to lousy hardware) by lowering parallel_tuple_cost.
Upvotes: 1