Ryon
Ryon

Reputation: 41

count(*) and count(*) over () performance difference in greenplum/postgreSQL

I want to query the detail data and the total number of detail data. In general, this requires two sentences of SQL. For example, one is select col1, col2, col3 from tb limit 50 offset 0, another one is select count() from tb. In order to reduce the number of queries, I use "select col1, col2, col3, count() over () totol_count from tb" instead.

But I find that the latter(select col1, col2, col3, count(*) over () totol_count from tb limit 50 offset 0) is sometimes much slower than the former. Why? What is the performance difference between the two methods?

Here is the EXPLAIN ANALYZE.I just put the different on it.

  1. select col1 from tb limit 20 offset 0 Limit (cost=743828.93..747114.71 rows=200 width=287) (actual time=2290.254..2390.969 rows=200 loops=1) -> Gather Motion 20:1 (slice3; segments: 20) (cost=743828.93..747114.71 rows=200 width=287) (actual time=2290.248..2390.933 rows=200 loops=1) -> Limit (cost=743828.93..747110.71 rows=10 width=287) (actual time=2285.350..2385.507 rows=200 loops=1)

  2. select count(*) from tb Aggregate (cost=6123212.33..6123212.34 rows=1 width=8) (actual time=2339.615..2339.616 rows=1 loops=1) -> Gather Motion 20:1 (slice2; segments: 20) (cost=6123212.10..6123212.32 rows=1 width=8) (actual time=635.672..2339.433 rows=20 loops=1) -> Aggregate (cost=6123212.10..6123212.11 rows=1 width=8) (actual time=633.879..633.879 rows=1 loops=1)

  3. select col, count(*) over() from tb limit 20 offset 0 Limit (cost=743828.93..747113.21 rows=200 width=287) (actual time=57787.942..57788.339 rows=200 loops=1) -> WindowAgg (cost=743828.93..307341631.82 rows=18670608 width=287) (actual time=57787.936..57788.310 rows=200 loops=1) -> Gather Motion 20:1 (slice3; segments: 20) (cost=743828.93..307014896.18 rows=18670608 width=287) (actual time=1514.368..32796.802 rows=18735134 loops=1)

Upvotes: 4

Views: 1637

Answers (1)

Only after seeing your execution plans it can be determined for sure which is better option for you.

But if you run Analyze enough you can use:

SELECT reltuples::bigint AS estimate
FROM   pg_class
WHERE  oid = 'tb'::regclass;

It's extremely fast but will not give you accurate count rather approximate one. Accuracy will depend on the frequency of running Analyze.

The reason why select count(*) from tb slow is related to the MVCC implementation in PostgreSQL. The fact that multiple transactions can see different states of the data means that there can be no straightforward way for "COUNT(*)" to summarize data across the whole table; PostgreSQL must walk through all rows, in some sense. This normally results in a sequential scan reading information about every row in the table. Slow counting in PostgreSQL

Upvotes: 0

Related Questions