samaswin
samaswin

Reputation: 384

Need to improve count performance in PostgreSQL for this query

I have this query in PostgreSQL:

SELECT COUNT("contacts"."id") 
FROM "contacts" 
  INNER JOIN  "phone_numbers" ON "phone_numbers"."id" = "contacts"."phone_number_id" 
  INNER JOIN "companies" ON "companies"."id" = "contacts"."company_id"
WHERE (
        (
          (
            CAST("phone_numbers"."value" AS VARCHAR) ILIKE '%a%' 
            OR CAST("contacts"."first_name" AS VARCHAR) ILIKE '%a%'
          ) 
          OR CAST("contacts"."last_name" AS VARCHAR) ILIKE '%a%'
        )  
        OR CAST("companies"."name" AS VARCHAR) ILIKE '%a%'
      )

When I run the query it is taking 19secs to run. I need to improve the performance.

Note: I already have the index for the columns.

EXPLAIN ANALYZE report

Finalize Aggregate  (cost=209076.49..209076.54 rows=1 width=8) (actual time=6117.381..6646.477 rows=1 loops=1)
  ->  Gather  (cost=209076.42..209076.48 rows=4 width=8) (actual time=6117.370..6646.473 rows=5 loops=1)
        Workers Planned: 4
        Workers Launched: 4
        ->  Partial Aggregate  (cost=209066.42..209066.47 rows=1 width=8) (actual time=5952.710..5952.723 rows=1 loops=5)
              ->  Hash Join  (cost=137685.37..208438.42 rows=251200 width=8) (actual time=3007.055..5945.571 rows=39193 loops=5)
                    Hash Cond: (contacts.company_id = companies.id)
                    Join Filter: (((phone_numbers.value)::text ~~* '%as%'::text) OR ((contacts.first_name)::text ~~* '%as%'::text) OR ((contacts.last_name)::text ~~* '%as%'::text) OR ((companies.name)::text ~~* '%as%'::text))
                    Rows Removed by Join Filter: 763817
                    ->  Parallel Hash Join  (cost=137684.86..201964.34 rows=1003781 width=41) (actual time=3006.633..4596.987 rows=803010 loops=5)
                          Hash Cond: (contacts.phone_number_id = phone_numbers.id)
                          ->  Parallel Seq Scan on contacts  (cost=0.00..59316.85 rows=1003781 width=37) (actual time=11.032..681.124 rows=803010 loops=5)
                          ->  Parallel Hash  (cost=68914.22..68914.22 rows=1295458 width=20) (actual time=1632.770..1632.770 rows=803184 loops=5)
                                Buckets: 65536  Batches: 64  Memory Usage: 4032kB
                                ->  Parallel Seq Scan on phone_numbers  (cost=0.00..68914.22 rows=1295458 width=20) (actual time=10.780..1202.242 rows=803184 loops=5)
                    ->  Hash  (cost=0.30..0.30 rows=4 width=40) (actual time=0.258..0.258 rows=4 loops=5)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on companies  (cost=0.00..0.30 rows=4 width=40) (actual time=0.247..0.248 rows=4 loops=5)
Planning Time: 1.895 ms
Execution Time: 6646.558 ms

Please help me on this performance issue.

I tried FUNCTION row_count_estimate (query text) and it is not giving the exact count.

Solution Tried: I tried the Robert solution and got 16 Secs to run

My Query is:

SELECT Count(id) AS id
FROM   (
              SELECT contacts.id AS id
              FROM   contacts 
              WHERE  (
                            contacts.last_name ilike '%as%')
              OR     (
                            contacts.last_name ilike '%as%')
              UNION
              SELECT contacts.id AS id
              FROM   contacts
              WHERE  contacts.phone_number_id IN
                     (
                            SELECT phone_numbers.id AS phone_number_id
                            FROM   phone_numbers
                            WHERE  phone_numbers.value ilike '%as%')
              UNION
              SELECT contacts.id AS id
              FROM   contacts
              WHERE  contacts.company_id IN
                     (
                            SELECT companies.id AS company_id
                            FROM   companies
                            WHERE  companies.name ilike '%as%' )) AS ID

Report:

Aggregate  (cost=395890.08..395890.13 rows=1 width=8) (actual time=5942.601..5942.667 rows=1 loops=1)
  ->  Unique  (cost=332446.76..337963.57 rows=1103362 width=8) (actual time=5929.800..5939.658 rows=101989 loops=1)
        ->  Sort  (cost=332446.76..335205.17 rows=1103362 width=8) (actual time=5929.799..5933.823 rows=101989 loops=1)
              Sort Key: contacts.id
              Sort Method: external merge  Disk: 1808kB
              ->  Append  (cost=10.00..220843.02 rows=1103362 width=8) (actual time=1.158..5900.926 rows=101989 loops=1)
                    ->  Gather  (cost=10.00..61935.48 rows=99179 width=8) (actual time=1.158..569.412 rows=101989 loops=1)
                          Workers Planned: 4
                          Workers Launched: 4
                          ->  Parallel Seq Scan on contacts  (cost=0.00..61826.30 rows=24795 width=8) (actual time=0.446..477.276 rows=20398 loops=5)
                                Filter: ((last_name)::text ~~* '%as%'::text)
                                Rows Removed by Filter: 782612
                    ->  Nested Loop  (cost=0.84..359.91 rows=402 width=8) (actual time=5292.088..5292.089 rows=0 loops=1)
                          ->  Index Scan using idx_phone_value on phone_numbers  (cost=0.41..64.13 rows=402 width=8) (actual time=5292.087..5292.087 rows=0 loops=1)
                                Index Cond: ((value)::text ~~* '%as%'::text)
                                Rows Removed by Index Recheck: 4015921
                          ->  Index Scan using index_contacts_on_phone_number_id on contacts contacts_1  (cost=0.43..0.69 rows=1 width=16) (never executed)
                                Index Cond: (phone_number_id = phone_numbers.id)
                    ->  Gather  (cost=10.36..75795.48 rows=1003781 width=8) (actual time=26.298..26.331 rows=0 loops=1)
                          Workers Planned: 4
                          Workers Launched: 4
                          ->  Hash Join  (cost=0.36..74781.70 rows=250945 width=8) (actual time=3.758..3.758 rows=0 loops=5)
                                Hash Cond: (contacts_2.company_id = companies.id)
                                ->  Parallel Seq Scan on contacts contacts_2  (cost=0.00..59316.85 rows=1003781 width=16) (actual time=0.128..0.128 rows=1 loops=5)
                                ->  Hash  (cost=0.31..0.31 rows=1 width=8) (actual time=0.726..0.727 rows=0 loops=5)
                                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                      ->  Seq Scan on companies  (cost=0.00..0.31 rows=1 width=8) (actual time=0.726..0.726 rows=0 loops=5)
                                            Filter: ((name)::text ~~* '%as%'::text)
                                            Rows Removed by Filter: 4
Planning Time: 0.846 ms
Execution Time: 5948.330 ms

I tried the below also:

EXPLAIN ANALYZE  SELECT
        count(id) AS id 
    FROM
        (SELECT
            contacts.id AS id 
        FROM
            contacts 
        WHERE
            (
                position('as' in LOWER(last_name)) > 0
            ) 
        UNION
        SELECT
            contacts.id AS id 
        FROM
            contacts 
        WHERE
            EXISTS (
                SELECT
                    1 
                FROM
                    phone_numbers 
                WHERE
                    (
                        position('as' in LOWER(phone_numbers.value)) > 0
                    ) 
                    AND (
                        contacts.phone_number_id = phone_numbers.id
                    )
            ) 
        UNION 
        SELECT
            contacts.id AS id 
        FROM
            contacts 
        WHERE
            EXISTS (
                SELECT
                    1 
                FROM
                    companies 
                WHERE
                    (
                        position('as' in LOWER(companies.name)) > 0
                    ) 
                    AND (
                        contacts.company_id = companies.id
                    )
            ) 
        UNION DISTINCT SELECT
            contacts.id AS id 
        FROM
            contacts 
        WHERE
            (
                position('as' in LOWER(first_name)) > 0
            )
    ) AS ID;

Report

Aggregate  (cost=1609467.66..1609467.71 rows=1 width=8) (actual time=1039.249..1039.330 rows=1 loops=1)
  ->  Unique  (cost=1320886.03..1345980.09 rows=5018811 width=8) (actual time=999.363..1030.500 rows=195963 loops=1)
        ->  Sort  (cost=1320886.03..1333433.06 rows=5018811 width=8) (actual time=999.362..1013.818 rows=198421 loops=1)
              Sort Key: contacts.id
              Sort Method: external merge  Disk: 3520kB
              ->  Gather  (cost=10.00..754477.62 rows=5018811 width=8) (actual time=0.581..941.210 rows=198421 loops=1)
                    Workers Planned: 4
                    Workers Launched: 4
                    ->  Parallel Append  (cost=0.00..749448.80 rows=5018811 width=8) (actual time=290.521..943.736 rows=39684 loops=5)
                          ->  Parallel Hash Join  (cost=101469.35..164569.24 rows=334587 width=8) (actual time=724.841..724.843 rows=0 loops=2)
                                Hash Cond: (contacts.phone_number_id = phone_numbers.id)
                                ->  Parallel Seq Scan on contacts  (cost=0.00..59315.91 rows=1003762 width=16) (never executed)
                                ->  Parallel Hash  (cost=78630.16..78630.16 rows=431819 width=8) (actual time=723.735..723.735 rows=0 loops=2)
                                      Buckets: 131072  Batches: 32  Memory Usage: 0kB
                                      ->  Parallel Seq Scan on phone_numbers  (cost=0.00..78630.16 rows=431819 width=8) (actual time=723.514..723.514 rows=0 loops=2)
                                            Filter: ("position"(lower((value)::text), 'as'::text) > 0)
                                            Rows Removed by Filter: 2007960
                          ->  Hash Join  (cost=0.38..74780.48 rows=250940 width=8) (actual time=0.888..0.888 rows=0 loops=1)
                                Hash Cond: (contacts_1.company_id = companies.id)
                                ->  Parallel Seq Scan on contacts contacts_1  (cost=0.00..59315.91 rows=1003762 width=16) (actual time=0.009..0.009 rows=1 loops=1)
                                ->  Hash  (cost=0.33..0.33 rows=1 width=8) (actual time=0.564..0.564 rows=0 loops=1)
                                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                      ->  Seq Scan on companies  (cost=0.00..0.33 rows=1 width=8) (actual time=0.563..0.563 rows=0 loops=1)
                                            Filter: ("position"(lower((name)::text), 'as'::text) > 0)
                                            Rows Removed by Filter: 4
                          ->  Parallel Seq Scan on contacts contacts_2  (cost=0.00..66844.13 rows=334588 width=8) (actual time=0.119..315.032 rows=20398 loops=5)
                                Filter: ("position"(lower((last_name)::text), 'as'::text) > 0)
                                Rows Removed by Filter: 782612
                          ->  Parallel Seq Scan on contacts contacts_3  (cost=0.00..66844.13 rows=334588 width=8) (actual time=0.510..558.791 rows=32144 loops=3)
                                Filter: ("position"(lower((first_name)::text), 'as'::text) > 0)
                                Rows Removed by Filter: 1306206
Planning Time: 2.115 ms
Execution Time: 1040.620 ms

Upvotes: 0

Views: 277

Answers (1)

RobertG
RobertG

Reputation: 416

It's hard to help you, because I don't have acces to your data. Let me try...

EXPLAIN ANALYZE report shows that:

  1. Yor query doesn't using indexes. Full scan on table phone_numbers tooks 1.202 second, and 0.681 senod on contacts table.

  2. "Rows Removed by Join Filter: 763817". "Parallel Hash Join (cost=137684.86..201964.34 rows=1003781 width=41) (actual time=3006.633..4596.987 rows=803010 loops=5)" . So this query joins ~800k rows and then filter 763k of it.

Maybe you can reverse that. This should speed up (but that needs to be checked).

For example you can test this - rewrite your query in this direction:

    SELECT COUNT( ID)
    FROM
     (
       SELECT "contacts"."id"
        FROM "contacts" 
        Where <filters on contract here>
       union
       SELECT "contacts"."id"
        FROM "contacts"
        where phone_number_id in ( select   "phone_numbers"."id"
                                    from "phone_numbers" 
                                   where <filters on  phone_numbers here> 
                                 ) as A
       union
       SELECT "contacts"."id"
        FROM "contacts"
        where company_id in ( select   "companies"."id"
                                    from "companies" 
                                   where <filters on  companies here> )
                            ) as B

Two indexes: one on column contacts.phone_number_id and another on contacts.company_id might help.

EDIT: It using index on "phone_numbers"."id" with nested loop it tooks 5 seconds. Try to avoid this.

Please check, what it will do for this:

SELECT Count(id) AS id
FROM   (
              SELECT contacts.id AS id
              FROM   contacts 
              WHERE  (
                            contacts.last_name ilike '%as%')
              OR     (
                            contacts.last_name ilike '%as%')
              UNION
              SELECT contacts.id AS id
              FROM   contacts
              WHERE  contacts.phone_number_id IN
                     (
                            SELECT to_number(to_char(phone_numbers.id))) /* just for disable index scan for that column */ AS phone_number_id
                            FROM   phone_numbers
                            WHERE  phone_numbers.value ilike '%as%')
              UNION
              SELECT contacts.id AS id
              FROM   contacts
              WHERE  contacts.company_id IN
                     (
                            SELECT companies.id AS company_id
                            FROM   companies
                            WHERE  companies.name ilike '%as%' )) AS ID

Aggregate  (cost=419095.35..419095.40 rows=1 width=8) (actual time=13235.986..13236.335 rows=1 loops=1)
  ->  Unique  (cost=346875.23..353155.24 rows=1256002 width=8) (actual time=13211.350..13230.729 rows=195963 loops=1)
        ->  Sort  (cost=346875.23..350015.24 rows=1256002 width=8) (actual time=13211.349..13219.607 rows=195963 loops=1)
              Sort Key: contacts.id
              Sort Method: external merge  Disk: 3472kB
              ->  Append  (cost=2249.63..218658.27 rows=1256002 width=8) (actual time=5927.019..13164.421 rows=195963 loops=1)
                    ->  Gather  (cost=2249.63..48279.58 rows=251838 width=8) (actual time=5927.019..6911.795 rows=195963 loops=1)
                          Workers Planned: 4
                          Workers Launched: 4
                          ->  Parallel Bitmap Heap Scan on contacts  (cost=2239.63..48017.74 rows=62960 width=8) (actual time=5861.480..6865.957 rows=39193 loops=5)
                                Recheck Cond: (((first_name)::text ~~* '%as%'::text) OR ((last_name)::text ~~* '%as%'::text))
                                Rows Removed by Index Recheck: 763815
                                Heap Blocks: exact=10860 lossy=6075
                                ->  BitmapOr  (cost=2239.63..2239.63 rows=255705 width=0) (actual time=5917.966..5917.966 rows=0 loops=1)
                                      ->  Bitmap Index Scan on idx_trgm_contacts_first_name  (cost=0.00..1291.57 rows=156527 width=0) (actual time=2972.404..2972.404 rows=4015039 loops=1)
                                            Index Cond: ((first_name)::text ~~* '%as%'::text)
                                      ->  Bitmap Index Scan on idx_trgm_contacts_last_name  (cost=0.00..822.14 rows=99177 width=0) (actual time=2945.560..2945.560 rows=4015038 loops=1)
                                            Index Cond: ((last_name)::text ~~* '%as%'::text)
                    ->  Nested Loop  (cost=81.96..384.33 rows=402 width=8) (actual time=6213.028..6213.028 rows=0 loops=1)
                          ->  Unique  (cost=81.52..83.53 rows=402 width=8) (actual time=6213.027..6213.027 rows=0 loops=1)
                                ->  Sort  (cost=81.52..82.52 rows=402 width=8) (actual time=6213.027..6213.027 rows=0 loops=1)
                                      Sort Key: ((NULLIF((phone_numbers.id)::text, ''::text))::integer)
                                      Sort Method: quicksort  Memory: 25kB
                                      ->  Index Scan using idx_trgm_phone_value on phone_numbers  (cost=0.41..64.13 rows=402 width=8) (actual time=6213.006..6213.006 rows=0 loops=1)
                                            Index Cond: ((value)::text ~~* '%as%'::text)
                                            Rows Removed by Index Recheck: 4015921
                          ->  Index Scan using index_contacts_on_phone_number_id on contacts contacts_1  (cost=0.44..0.70 rows=1 width=16) (never executed)
                                Index Cond: (phone_number_id = (NULLIF((phone_numbers.id)::text, ''::text))::integer)
                    ->  Gather  (cost=10.36..75794.22 rows=1003762 width=8) (actual time=25.691..25.709 rows=0 loops=1)
                          Workers Planned: 4
                          Workers Launched: 4
                          ->  Hash Join  (cost=0.36..74780.46 rows=250940 width=8) (actual time=2.653..2.653 rows=0 loops=5)
                                Hash Cond: (contacts_2.company_id = companies.id)
                                ->  Parallel Seq Scan on contacts contacts_2  (cost=0.00..59315.91 rows=1003762 width=16) (actual time=0.244..0.244 rows=1 loops=5)
                                ->  Hash  (cost=0.31..0.31 rows=1 width=8) (actual time=0.244..0.244 rows=0 loops=5)
                                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                      ->  Seq Scan on companies  (cost=0.00..0.31 rows=1 width=8) (actual time=0.244..0.244 rows=0 loops=5)
                                            Filter: ((name)::text ~~* '%as%'::text)
                                            Rows Removed by Filter: 4
Planning Time: 1.458 ms
Execution Time: 13236.949 ms

I tried below,

SELECT Count(id) AS id
FROM   (
              SELECT contacts.id AS id
              FROM   contacts 
              WHERE  (substring(LOWER(contacts.first_name), position('as' in LOWER(first_name)), 2) = 'as')
              OR     (substring(LOWER(contacts.last_name), position('as' in LOWER(last_name)), 2) = 'as')
              UNION
              SELECT contacts.id AS id
              FROM   contacts
              WHERE  contacts.phone_number_id IN
                     (
                            SELECT NULLIF(CAST(phone_numbers.id AS text), '')::int AS phone_number_id 
                            FROM   phone_numbers
                            WHERE  (substring(LOWER(phone_numbers.value), position('as' in LOWER(phone_numbers.value)), 2) = 'as'))
              UNION
              SELECT contacts.id AS id
              FROM   contacts
              WHERE  contacts.company_id IN
                     (
                            SELECT companies.id AS company_id
                            FROM   companies
                            WHERE  (substring(LOWER(companies.name), position('as' in LOWER(companies.name)), 2) = 'as') )) AS ID

Aggregate  (cost=508646.88..508646.93 rows=1 width=8) (actual time=1455.892..1455.995 rows=1 loops=1)
  ->  Unique  (cost=447473.09..452792.55 rows=1063892 width=8) (actual time=1431.464..1450.434 rows=195963 loops=1)
        ->  Sort  (cost=447473.09..450132.82 rows=1063892 width=8) (actual time=1431.464..1439.267 rows=195963 loops=1)
              Sort Key: contacts.id
              Sort Method: external merge  Disk: 3472kB
              ->  Append  (cost=10.00..340141.41 rows=1063892 width=8) (actual time=0.391..1370.557 rows=195963 loops=1)
                    ->  Gather  (cost=10.00..84460.02 rows=40050 width=8) (actual time=0.391..983.457 rows=195963 loops=1)
                          Workers Planned: 4
                          Workers Launched: 4
                          ->  Parallel Seq Scan on contacts  (cost=0.00..84409.97 rows=10012 width=8) (actual time=1.696..987.285 rows=39193 loops=5)
                                Filter: (("substring"(lower((first_name)::text), "position"(lower((first_name)::text), 'as'::text), 2) = 'as'::text) OR ("substring"(lower((last_name)::text), "position"(lower((last_name)::text), 'as'::text), 2) = 'as'::text))
                                Rows Removed by Filter: 763817
                    ->  Nested Loop  (cost=85188.17..100095.23 rows=20080 width=8) (actual time=364.076..364.125 rows=0 loops=1)
                          ->  HashAggregate  (cost=85187.73..86191.73 rows=20080 width=8) (actual time=364.074..364.123 rows=0 loops=1)
                                Group Key: (NULLIF((phone_numbers.id)::text, ''::text))::integer
                                Batches: 1  Memory Usage: 793kB
                                ->  Gather  (cost=10.00..85137.53 rows=20080 width=8) (actual time=363.976..364.025 rows=0 loops=1)
                                      Workers Planned: 3
                                      Workers Launched: 3
                                      ->  Parallel Seq Scan on phone_numbers  (cost=0.00..85107.45 rows=6477 width=8) (actual time=357.030..357.031 rows=0 loops=4)
                                            Filter: ("substring"(lower((value)::text), "position"(lower((value)::text), 'as'::text), 2) = 'as'::text)
                                            Rows Removed by Filter: 1003980
                          ->  Index Scan using index_contacts_on_phone_number_id on contacts contacts_1  (cost=0.44..0.64 rows=1 width=16) (never executed)
                                Index Cond: (phone_number_id = (NULLIF((phone_numbers.id)::text, ''::text))::integer)
                    ->  Gather  (cost=10.40..75794.26 rows=1003762 width=8) (actual time=6.889..6.910 rows=0 loops=1)
                          Workers Planned: 4
                          Workers Launched: 4
                          ->  Hash Join  (cost=0.40..74780.50 rows=250940 width=8) (actual time=0.138..0.139 rows=0 loops=5)
                                Hash Cond: (contacts_2.company_id = companies.id)
                                ->  Parallel Seq Scan on contacts contacts_2  (cost=0.00..59315.91 rows=1003762 width=16) (actual time=0.004..0.004 rows=1 loops=5)
                                ->  Hash  (cost=0.35..0.35 rows=1 width=8) (actual time=0.081..0.081 rows=0 loops=5)
                                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                      ->  Seq Scan on companies  (cost=0.00..0.35 rows=1 width=8) (actual time=0.081..0.081 rows=0 loops=5)
                                            Filter: ("substring"(lower((name)::text), "position"(lower((name)::text), 'as'::text), 2) = 'as'::text)
                                            Rows Removed by Filter: 4
Planning Time: 0.927 ms
Execution Time: 1456.742 ms

Upvotes: 1

Related Questions