Slow index scan with about 10 million rows

Question

I have a table with about 10 million entries, which I'm trying to optimize.

create table houses
(
    id                                serial                          not null
        constraint houses_pkey
            primary key,
    secondary_id                      text                            not null,
    market                            integer                         not null,
    user_id                           uuid                            not null,
    status                            text      default ‘’::text      not null,
    custom                            boolean   default false,
    constraint houses_unique_constraint
        unique (user_id, market, secondary_id)
);

create index houses_user_index
    on houses (user_id);
create index houses_user_market_index
    on houses (user_id, market);
create index houses_user_status_index
    on houses (user_id, status);

I have an use case, where I want to find all distinct non-null user_id and market combinations with given statuses and if any of the entries have their custom flag set. I'm using the following query, but it's very slow. Do you have any ideas what I could optimize here? Thank you!

postgres=# EXPLAIN ANALYZE VERBOSE SELECT DISTINCT user_id, market, bool_or(custom) 
FROM houses WHERE user_id IS NOT NULL 
AND status=ANY(‘{open, sold}‘) GROUP by user_id, market;
                                                                                   QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=1694157.78..1695700.38 rows=154260 width=21) (actual time=9574.290..9704.120 rows=809916 loops=1)
   Output: user_id, market, (bool_or(custom))
   ->  Sort  (cost=1694157.78..1694543.43 rows=154260 width=21) (actual time=9574.289..9625.108 rows=809916 loops=1)
         Output: user_id, market, (bool_or(custom))
         Sort Key: houses.user_id, houses.market, (bool_or(houses.custom))
         Sort Method: external sort  Disk: 24544kB
         ->  GroupAggregate  (cost=0.56..1677700.42 rows=154260 width=21) (actual time=0.396..9290.278 rows=809916 loops=1)
               Output: user_id, market, bool_or(custom)
               Group Key: houses.user_id, houses.market
               ->  Index Scan using houses_user_market_index on public.houses  (cost=0.56..1615726.52 rows=8057507 width=21) (actual time=0.350..8647.480 rows=8114889 loops=1)
                     Output: user_id, market, custom
                     Index Cond: (houses.user_id IS NOT NULL)
                     Filter: (houses.status = ANY (‘{open,sold}’::text[]))
                     Rows Removed by Filter: 892609
 Planning time: 0.889 ms
 Execution time: 9729.300 ms
(16 rows)

I have tried adding more indices to cover the custom field as well, but it doesn't seem to make any difference.

Slow index scan with about 10 million rows

Answers (1)

Related Questions