Postgres: Why does select count(*) take so long

Question

TL;DR

I had a problem table that was very slow for queries. I ran pg_repack on it to rebuild the table, but it was still slow. Unfortunately, pg_repack did not rebuild the table. I had to dump and reload the table via pg_dump.

Analyse shows lots of dead rows.

# analyse verbose payslip;
INFO:  analyzing "public.payslip"
INFO:  "payslip": scanned 30000 of 458337 pages, containing 8732 live rows and 400621 dead rows; 8732 rows in sample, 133407 estimated total rows
ANALYZE

Autovacuum was not working. This article identifies potential problem...

https://www.cybertec-postgresql.com/en/reasons-why-vacuum-wont-remove-dead-rows/

Original thread

I have a table with 140k rows which grows by about 500 rows per week.

A few weeks ago I investigated the queries on the table, and found that all queries are slow. For example select count() took 6 seconds. I rebuilt the table using pg_repack, and assumed that was the end of it. I noticed that the table is slow again today, 3 seconds for select count().

There are 138 tables in the database, and only one other table, with 1.3 million rows takes more than a second to do the select count(*).

I am wondering whether there is corruption, whether this is a bug in Postgres, or whether there is a tuning issue.

Information

Here is the count via psql (today)

# select count(*) from payslip;
 count  
--------
 140327
(1 row)

Time: 3255.772 ms (00:03.256)

Here is the query plan

# explain select count(*) from payslip;
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Aggregate  (cost=142820.48..142820.49 rows=1 width=8)
   ->  Bitmap Heap Scan on payslip  (cost=22543.92..142479.77 rows=136285 width=0)
         ->  Bitmap Index Scan on payslip_idx3  (cost=0.00..22509.84 rows=136285 width=0)
(3 rows)

This is the data model (truncated).

                         Table "public.payslip"
          Column          |          Type          | Collation | Nullable |                   Default                    
--------------------------+------------------------+-----------+----------+----------------------------------------------
 taxregno                 | character varying(20)  |           | not null | 
 worksid                  | character varying(8)   |           | not null | 
 cutoffdate               | character(10)          |           | not null | 
 productionid             | integer                |           | not null | 
... 

Ignore 50 columns

Indexes:
    "payslip_pkey" PRIMARY KEY, btree (taxregno, worksid, cutoffdate, productionid)
    "payslip_k1" UNIQUE, btree (taxregno, worksid, cutoffdate, productionid)
    "payslip_idx3" btree (worksid)
    "payslip_idx4" btree (ppsnumber)

Postgres Version is currently 11. This database base migrated over 10plus years from Postgres 8 to the current version. I just followed the instructions in various Ubuntu upgrades.

$ psql -V
psql (PostgreSQL) 11.3 (Ubuntu 11.3-1.pgdg14.04+1)

The server is running on a Linode linux box with an SSD store. I set the postgresql.conf page costs to reflect an SSD.

#seq_page_cost = 1.0            # measured on an arbitrary scale
random_page_cost = 1.0          # same scale as above

Today

Unfortunately, this is a production server and I need to sort out the performance problem short term. Therefore, I ran pg_repack again now.

After pg_repack

# select count(*) from payslip;
 count  
--------
 140327
(1 row)

Time: 26.216 ms

# explain select count(*) from payslip;
                              QUERY PLAN                              
----------------------------------------------------------------------
 Aggregate  (cost=10974.09..10974.10 rows=1 width=8)
   ->  Seq Scan on payslip  (cost=0.00..10623.27 rows=140327 width=0)
(2 rows)

As requested by a_horse_with_no_name below, here is the further information. As mentioned above this is against the table after it was rebuilt.

# explain (analyze, buffers, timing) select count(*) from payslip;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12850.75..12850.76 rows=1 width=8) (actual time=42.070..42.071 rows=1 loops=1)
   Buffers: shared hit=11022
   ->  Seq Scan on payslip  (cost=0.00..12485.00 rows=146300 width=0) (actual time=0.010..31.669 rows=140327 loops=1)
         Buffers: shared hit=11022
 Planning Time: 0.102 ms
 Execution Time: 42.115 ms
(6 rows)

Update after one week.

It's been a quiet week here. The table grew by 250 rows. select count(*) slowed from .04 seconds to .7 seconds. The query changed back from the faster sequential scan to the slower bitmap index scan.

select count(*) from payslip;
 140572

Time: 643.144 ms

Here is the detail.

explain (analyze, buffers, timing) select count(*) from payslip;
 Aggregate  (cost=108251.57..108251.58 rows=1 width=8) (actual time=718.015..718.016 rows=1 loops=1)
   Buffers: shared hit=169407
   ->  Bitmap Heap Scan on payslip  (cost=8522.42..107900.14 rows=140572 width=0) (actual time=229.612..707.319 rows=140572 loops=1)
         Heap Blocks: exact=76839 lossy=84802
         Buffers: shared hit=169407
         ->  Bitmap Index Scan on payslip_idx3  (cost=0.00..8487.28 rows=140572 width=0) (actual time=205.228..205.228 rows=2212168 loops=1)
               Buffers: shared hit=7757
 Planning Time: 0.115 ms
 Execution Time: 718.069 ms

Update after two weeks

It is now two weeks since I rebuilt the table. This week the table grew by 340 rows. The select count(*) time dropped from .6 seconds to 2 seconds.

select count(*) from payslip;
 count  
--------
 140914
(1 row)

Time: 2077.577 ms (00:02.078)

There is no change to the query plan, execution is just much slower.

explain (analyze, buffers, timing) select count(*) from payslip;
                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=138089.00..138089.01 rows=1 width=8) (actual time=2068.305..2068.305 rows=1 loops=1)
   Buffers: shared hit=8 read=324086 written=1
   ->  Bitmap Heap Scan on payslip  (cost=17071.92..137736.72 rows=140914 width=0) (actual time=270.512..2056.755 rows=140914 loops=1)
         Heap Blocks: exact=8198 lossy=301091
         Buffers: shared hit=8 read=324086 written=1
         ->  Bitmap Index Scan on payslip_idx3  (cost=0.00..17036.69 rows=140914 width=0) (actual time=268.801..268.801 rows=4223367 loops=1)
               Buffers: shared read=14794
 Planning Time: 0.164 ms
 Execution Time: 2068.623 ms
(9 rows)

Time: 2069.567 ms (00:02.070)

The index (idx3) chosen is a duplicate index, with 22k unique values in 140k records. The bitmap index scan indicates that 4 million rows were scanned this week (after 400 insertions), 2 million were scanned for the same query last week so that is in line with the performance degradation.

Information from the index maintenance queries (suggested by richyen)

 relname | rows_in_bytes | num_rows | number_of_indexes | unique | single_column | multi_column 
---------+---------------+----------+-------------------+--------+---------------+--------------
 payslip | 138 kB        |   140914 |                 4 | Y      |             2 |            2



 schemaname | tablename |  indexname   | num_rows | table_size | index_size | unique | number_of_scans | tuples_read | tuples_fetched 
------------+-----------+--------------+----------+------------+------------+--------+-----------------+-------------+----------------
 public     | payslip   | payslip_k1   |   140914 | 2420 MB    | 244 MB     | Y      |           39720 |  3292501603 |       14295183
 public     | payslip   | payslip_idx4 |   140914 | 2420 MB    | 156 MB     | N      |           43013 |  9529447977 |       34943724
 public     | payslip   | payslip_idx3 |   140914 | 2420 MB    | 116 MB     | N      |           42812 |  3067603558 |       72358879
 public     | payslip   | payslip_pkey |   140914 | 2420 MB    | 244 MB     | Y      |            3540 |   203676311 |        4213496
(4 rows)


  size   |             idx1             |              idx2               |         idx3         | idx4 
---------+------------------------------+---------------------------------+----------------------+------
 488 MB  | payslip_pkey                 | payslip_k1                      |                      |

At this stage, I redesigned the tables indexes. I made the primary key an integer value from a sequence, and included the sequential number in all indexes to make them unique.

Since the indexes are rebuilt the select count(*) has gone back to doing a sequential scan. I will have to wait for the table to grow a little to see if the query makes the millions of row reads.

explain (analyze, buffers, timing) select count(*) from payslip;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=312850.42..312850.43 rows=1 width=8) (actual time=1348.241..1348.242 rows=1 loops=1)
   Buffers: shared hit=199941 read=111148
   ->  Seq Scan on payslip  (cost=0.00..312498.14 rows=140914 width=0) (actual time=209.227..1336.035 rows=140914 loops=1)
         Buffers: shared hit=199941 read=111148
 Planning Time: 0.069 ms
 Execution Time: 1348.289 ms
(6 rows)

Index information is now

 schemaname | tablename |  indexname   | num_rows | table_size | index_size | unique | number_of_scans | tuples_read | tuples_fetched 
------------+-----------+--------------+----------+------------+------------+--------+-----------------+-------------+----------------
 public     | payslip   | payslip_pkey |   140914 | 2430 MB    | 91 MB      | Y      |               0 |           0 |              0
 public     | payslip   | payslip_idx2 |   140914 | 2430 MB    | 202 MB     | Y      |               0 |           0 |              0
 public     | payslip   | payslip_idx4 |   140914 | 2430 MB    | 128 MB     | Y      |               0 |           0 |              0
 public     | payslip   | payslip_idx3 |   140914 | 2430 MB    | 128 MB     | N      |               0 |           0 |              0
(4 rows)

Problem Solved

I finally worked out the solution. My issue is that I assumed pg_repack rebuilt the table as the name suggested. It didn't. The table was completely fragmented.

For some reason, I don't know why, with the fragmented table, postgresql decided to do a sequential scan instead of an index scan.

This is what I should have looked at.

# analyse verbose payslip;
INFO:  analyzing "public.payslip"
INFO:  "payslip": scanned 30000 of 458337 pages, containing 8732 live rows and 400621 dead rows; 8732 rows in sample, 133407 estimated total rows
ANALYZE

The problem was solved very quickly using a pg_dump and reloading the table.

I investigated the problem further, and found this excellent article.

https://www.cybertec-postgresql.com/en/reasons-why-vacuum-wont-remove-dead-rows/

There were two dead prepared transactions on the database which prevented autovacuum from working properly.

SELECT gid, prepared, owner, database, transaction AS xmin

Thanks to everyone for their help.

richyen · Accepted Answer

The changes from last week to this week indicate that a lot of your data for payslip is no longer in cache (see the changes in hit and read in your Buffers: sections).

Note also that your Heap Blocks are getting more lossy, meaning that you probably have work_mem set too low for the operation.

You probably should increase work_mem to at least 25MB for this week (since the latest stats indicate accessing ~309k page blocks). However, you'll likely need to increase it more as the table grows--work_mem can be set on a per-session basis, so you'll need to set it based on a pattern-based prediction of the table size (I don't like the idea, but it's also not recommended to set work_mem arbitrarily high, as a global setting could lead to over-allocation of memory)

I'm not very clear on the internals of pg_repack, but I wonder if you saw the post-repack performance boost because stuff got stored in memory, and gets flushed out over time.

Disclosure: I work for EnterpriseDB (EDB)

Postgres: Why does select count(*) take so long

SELECT gid, prepared, owner, database, transaction AS xmin

Answers (1)

Related Questions