Need to improve count performance in PostgreSQL for this query

Question

I have this query in PostgreSQL:

SELECT COUNT("contacts"."id") 
FROM "contacts" 
  INNER JOIN  "phone_numbers" ON "phone_numbers"."id" = "contacts"."phone_number_id" 
  INNER JOIN "companies" ON "companies"."id" = "contacts"."company_id"
WHERE (
        (
          (
            CAST("phone_numbers"."value" AS VARCHAR) ILIKE '%a%' 
            OR CAST("contacts"."first_name" AS VARCHAR) ILIKE '%a%'
          ) 
          OR CAST("contacts"."last_name" AS VARCHAR) ILIKE '%a%'
        )  
        OR CAST("companies"."name" AS VARCHAR) ILIKE '%a%'
      )

When I run the query it is taking 19secs to run. I need to improve the performance.

Note: I already have the index for the columns.

EXPLAIN ANALYZE report

Finalize Aggregate  (cost=209076.49..209076.54 rows=1 width=8) (actual time=6117.381..6646.477 rows=1 loops=1)
  ->  Gather  (cost=209076.42..209076.48 rows=4 width=8) (actual time=6117.370..6646.473 rows=5 loops=1)
        Workers Planned: 4
        Workers Launched: 4
        ->  Partial Aggregate  (cost=209066.42..209066.47 rows=1 width=8) (actual time=5952.710..5952.723 rows=1 loops=5)
              ->  Hash Join  (cost=137685.37..208438.42 rows=251200 width=8) (actual time=3007.055..5945.571 rows=39193 loops=5)
                    Hash Cond: (contacts.company_id = companies.id)
                    Join Filter: (((phone_numbers.value)::text ~~* '%as%'::text) OR ((contacts.first_name)::text ~~* '%as%'::text) OR ((contacts.last_name)::text ~~* '%as%'::text) OR ((companies.name)::text ~~* '%as%'::text))
                    Rows Removed by Join Filter: 763817
                    ->  Parallel Hash Join  (cost=137684.86..201964.34 rows=1003781 width=41) (actual time=3006.633..4596.987 rows=803010 loops=5)
                          Hash Cond: (contacts.phone_number_id = phone_numbers.id)
                          ->  Parallel Seq Scan on contacts  (cost=0.00..59316.85 rows=1003781 width=37) (actual time=11.032..681.124 rows=803010 loops=5)
                          ->  Parallel Hash  (cost=68914.22..68914.22 rows=1295458 width=20) (actual time=1632.770..1632.770 rows=803184 loops=5)
                                Buckets: 65536  Batches: 64  Memory Usage: 4032kB
                                ->  Parallel Seq Scan on phone_numbers  (cost=0.00..68914.22 rows=1295458 width=20) (actual time=10.780..1202.242 rows=803184 loops=5)
                    ->  Hash  (cost=0.30..0.30 rows=4 width=40) (actual time=0.258..0.258 rows=4 loops=5)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on companies  (cost=0.00..0.30 rows=4 width=40) (actual time=0.247..0.248 rows=4 loops=5)
Planning Time: 1.895 ms
Execution Time: 6646.558 ms

Please help me on this performance issue.

I tried FUNCTION row_count_estimate (query text) and it is not giving the exact count.

Solution Tried: I tried the Robert solution and got 16 Secs to run

My Query is:

SELECT Count(id) AS id
FROM   (
              SELECT contacts.id AS id
              FROM   contacts 
              WHERE  (
                            contacts.last_name ilike '%as%')
              OR     (
                            contacts.last_name ilike '%as%')
              UNION
              SELECT contacts.id AS id
              FROM   contacts
              WHERE  contacts.phone_number_id IN
                     (
                            SELECT phone_numbers.id AS phone_number_id
                            FROM   phone_numbers
                            WHERE  phone_numbers.value ilike '%as%')
              UNION
              SELECT contacts.id AS id
              FROM   contacts
              WHERE  contacts.company_id IN
                     (
                            SELECT companies.id AS company_id
                            FROM   companies
                            WHERE  companies.name ilike '%as%' )) AS ID

Report:

Aggregate  (cost=395890.08..395890.13 rows=1 width=8) (actual time=5942.601..5942.667 rows=1 loops=1)
  ->  Unique  (cost=332446.76..337963.57 rows=1103362 width=8) (actual time=5929.800..5939.658 rows=101989 loops=1)
        ->  Sort  (cost=332446.76..335205.17 rows=1103362 width=8) (actual time=5929.799..5933.823 rows=101989 loops=1)
              Sort Key: contacts.id
              Sort Method: external merge  Disk: 1808kB
              ->  Append  (cost=10.00..220843.02 rows=1103362 width=8) (actual time=1.158..5900.926 rows=101989 loops=1)
                    ->  Gather  (cost=10.00..61935.48 rows=99179 width=8) (actual time=1.158..569.412 rows=101989 loops=1)
                          Workers Planned: 4
                          Workers Launched: 4
                          ->  Parallel Seq Scan on contacts  (cost=0.00..61826.30 rows=24795 width=8) (actual time=0.446..477.276 rows=20398 loops=5)
                                Filter: ((last_name)::text ~~* '%as%'::text)
                                Rows Removed by Filter: 782612
                    ->  Nested Loop  (cost=0.84..359.91 rows=402 width=8) (actual time=5292.088..5292.089 rows=0 loops=1)
                          ->  Index Scan using idx_phone_value on phone_numbers  (cost=0.41..64.13 rows=402 width=8) (actual time=5292.087..5292.087 rows=0 loops=1)
                                Index Cond: ((value)::text ~~* '%as%'::text)
                                Rows Removed by Index Recheck: 4015921
                          ->  Index Scan using index_contacts_on_phone_number_id on contacts contacts_1  (cost=0.43..0.69 rows=1 width=16) (never executed)
                                Index Cond: (phone_number_id = phone_numbers.id)
                    ->  Gather  (cost=10.36..75795.48 rows=1003781 width=8) (actual time=26.298..26.331 rows=0 loops=1)
                          Workers Planned: 4
                          Workers Launched: 4
                          ->  Hash Join  (cost=0.36..74781.70 rows=250945 width=8) (actual time=3.758..3.758 rows=0 loops=5)
                                Hash Cond: (contacts_2.company_id = companies.id)
                                ->  Parallel Seq Scan on contacts contacts_2  (cost=0.00..59316.85 rows=1003781 width=16) (actual time=0.128..0.128 rows=1 loops=5)
                                ->  Hash  (cost=0.31..0.31 rows=1 width=8) (actual time=0.726..0.727 rows=0 loops=5)
                                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                      ->  Seq Scan on companies  (cost=0.00..0.31 rows=1 width=8) (actual time=0.726..0.726 rows=0 loops=5)
                                            Filter: ((name)::text ~~* '%as%'::text)
                                            Rows Removed by Filter: 4
Planning Time: 0.846 ms
Execution Time: 5948.330 ms

I tried the below also:

EXPLAIN ANALYZE  SELECT
        count(id) AS id 
    FROM
        (SELECT
            contacts.id AS id 
        FROM
            contacts 
        WHERE
            (
                position('as' in LOWER(last_name)) > 0
            ) 
        UNION
        SELECT
            contacts.id AS id 
        FROM
            contacts 
        WHERE
            EXISTS (
                SELECT
                    1 
                FROM
                    phone_numbers 
                WHERE
                    (
                        position('as' in LOWER(phone_numbers.value)) > 0
                    ) 
                    AND (
                        contacts.phone_number_id = phone_numbers.id
                    )
            ) 
        UNION 
        SELECT
            contacts.id AS id 
        FROM
            contacts 
        WHERE
            EXISTS (
                SELECT
                    1 
                FROM
                    companies 
                WHERE
                    (
                        position('as' in LOWER(companies.name)) > 0
                    ) 
                    AND (
                        contacts.company_id = companies.id
                    )
            ) 
        UNION DISTINCT SELECT
            contacts.id AS id 
        FROM
            contacts 
        WHERE
            (
                position('as' in LOWER(first_name)) > 0
            )
    ) AS ID;

Report

Aggregate  (cost=1609467.66..1609467.71 rows=1 width=8) (actual time=1039.249..1039.330 rows=1 loops=1)
  ->  Unique  (cost=1320886.03..1345980.09 rows=5018811 width=8) (actual time=999.363..1030.500 rows=195963 loops=1)
        ->  Sort  (cost=1320886.03..1333433.06 rows=5018811 width=8) (actual time=999.362..1013.818 rows=198421 loops=1)
              Sort Key: contacts.id
              Sort Method: external merge  Disk: 3520kB
              ->  Gather  (cost=10.00..754477.62 rows=5018811 width=8) (actual time=0.581..941.210 rows=198421 loops=1)
                    Workers Planned: 4
                    Workers Launched: 4
                    ->  Parallel Append  (cost=0.00..749448.80 rows=5018811 width=8) (actual time=290.521..943.736 rows=39684 loops=5)
                          ->  Parallel Hash Join  (cost=101469.35..164569.24 rows=334587 width=8) (actual time=724.841..724.843 rows=0 loops=2)
                                Hash Cond: (contacts.phone_number_id = phone_numbers.id)
                                ->  Parallel Seq Scan on contacts  (cost=0.00..59315.91 rows=1003762 width=16) (never executed)
                                ->  Parallel Hash  (cost=78630.16..78630.16 rows=431819 width=8) (actual time=723.735..723.735 rows=0 loops=2)
                                      Buckets: 131072  Batches: 32  Memory Usage: 0kB
                                      ->  Parallel Seq Scan on phone_numbers  (cost=0.00..78630.16 rows=431819 width=8) (actual time=723.514..723.514 rows=0 loops=2)
                                            Filter: ("position"(lower((value)::text), 'as'::text) > 0)
                                            Rows Removed by Filter: 2007960
                          ->  Hash Join  (cost=0.38..74780.48 rows=250940 width=8) (actual time=0.888..0.888 rows=0 loops=1)
                                Hash Cond: (contacts_1.company_id = companies.id)
                                ->  Parallel Seq Scan on contacts contacts_1  (cost=0.00..59315.91 rows=1003762 width=16) (actual time=0.009..0.009 rows=1 loops=1)
                                ->  Hash  (cost=0.33..0.33 rows=1 width=8) (actual time=0.564..0.564 rows=0 loops=1)
                                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                      ->  Seq Scan on companies  (cost=0.00..0.33 rows=1 width=8) (actual time=0.563..0.563 rows=0 loops=1)
                                            Filter: ("position"(lower((name)::text), 'as'::text) > 0)
                                            Rows Removed by Filter: 4
                          ->  Parallel Seq Scan on contacts contacts_2  (cost=0.00..66844.13 rows=334588 width=8) (actual time=0.119..315.032 rows=20398 loops=5)
                                Filter: ("position"(lower((last_name)::text), 'as'::text) > 0)
                                Rows Removed by Filter: 782612
                          ->  Parallel Seq Scan on contacts contacts_3  (cost=0.00..66844.13 rows=334588 width=8) (actual time=0.510..558.791 rows=32144 loops=3)
                                Filter: ("position"(lower((first_name)::text), 'as'::text) > 0)
                                Rows Removed by Filter: 1306206
Planning Time: 2.115 ms
Execution Time: 1040.620 ms

Need to improve count performance in PostgreSQL for this query

Answers (1)

Related Questions