Tran Hung
Tran Hung

Reputation: 53

Create index on EXISTS condition

My table structure:

table_a(id, company_id, approval_status, is_locked)
table_b(tba_id, status)

My Query:

SELECT COUNT(id) filter (WHERE approval_status = 2 
AND is_locked = true AND EXISTS 
(SELECT 1 from table_b WHERE table_b.tba_id = table_a.id 
AND table_b.status = 2) 
FROM table_a
GROUP BY company_id

I currently have the following index, but the performance is still slow:

CREATE INDEX multiple_filter_index ON table_a (approval_status, is_locked)

Can the performance of this query be improved by adding better indexes?

Here is the Query plan:

HashAggregate  (cost=463013.07..463013.10 rows=2 width=11) (actual time=47632.476..47632.476 rows=2 loops=1)
  Group Key: table_a.company_id
  ->  Seq Scan on table_a  (cost=0.00..3064.62 rows=100062 width=11) (actual time=0.003..23.326 rows=100062 loops=1)
  SubPlan 1
    ->  Seq Scan on table_b  (cost=0.00..477.27 rows=104 width=0) (actual time=1.430..1.430 rows=0 loops=33144)
          Filter: ((tba_id = table_a.id) AND (status = 2))
          Rows Removed by Filter: 17411
  SubPlan 2
    ->  Seq Scan on table_b table_b_1  (cost=0.00..433.73 rows=5820 width=4) (never executed)
          Filter: (status = 2)
Planning time: 0.902 ms
Execution time: 47632.565 ms

Upvotes: 3

Views: 765

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522396

Your current execution plan shows that Postgres is not using the index you defined at all. Rather, it is just doing two sequential scans of each table, which won't be particularly efficient if those tables are large.

First, AFAIK your query will be executed the same as this:

SELECT COUNT(id)
FROM table_a
WHERE
    approval_status = 2 AND
    is_locked = true AND
    EXISTS (SELECT 1 from table_b WHERE table_b.tba_id = table_a.id AND table_b.status = 2)
GROUP BY company_id;

That is to say, the Postgres filter will really just behave the same as if that logic were in a formal WHERE clause.

I would suggest creating an index on each of the two tables:

CREATE INDEX table_a_idx ON table_a (approval_status, is_locked, company_id);
CREATE INDEX table_b_idx ON table_b (status, tba_id);

The reasoning for the table_a_idx index is that we want to eliminate as many records as possible using the approval_status and is_locked filters. I also included the company_id in this index, to cover the GROUP BY column, hopefully avoiding the need to do an additional disk read after traversing the index.

The table_b_idx exists to speed up the EXISTS clause of your query.

I would also recommend that you use COUNT(*) instead of COUNT(id).

Upvotes: 1

Paul Maxwell
Paul Maxwell

Reputation: 35603

Try moving some filtering logic into a join

SELECT
    company_id
  , COUNT(CASE
        WHEN approval_status = 2 AND
            is_locked = TRUE AND
            b.tba_id IS NOT NULL
        THEN id
    END)
FROM table_a
LEFT JOIN (
    SELECT DISTINCT tba_id 
    FROM table_b
    ) b on b.tba_id = table_a.id
GROUP BY
    company_id

Upvotes: 0

Related Questions