Swaranga Sarma
Swaranga Sarma

Reputation: 13433

Fastest way to count exact number of rows in a very large table?

I have come across articles that state that SELECT COUNT(*) FROM TABLE_NAME will be slow when the table has lots of rows and lots of columns.

I have a table that might contain even billions of rows [it has approximately 15 columns]. Is there a better way to get the EXACT count of the number of rows of a table?

Please consider the following before your answer:

Upvotes: 288

Views: 622875

Answers (27)


Reputation: 11

Use COUNT_BIG() for fetching the count of records in a very large sized file.


Upvotes: 1


Reputation: 43666

With SQL Server 2019, you can use APPROX_COUNT_DISTINCT, which:

returns the approximate number of unique non-null values in a group

and from the docs:

APPROX_COUNT_DISTINCT is designed for use in big data scenarios and is optimized for the following conditions:

  • Access of data sets that are millions of rows or higher and
  • Aggregation of a column or columns that have many distinct values

Also, the function

  • implementation guarantees up to a 2% error rate within a 97% probability
  • requires less memory than an exhaustive COUNT DISTINCT operation
  • given the smaller memory footprint is less likely to spill memory to disk compared to a precise COUNT DISTINCT operation.

The algorithm behind the implementation its HyperLogLog.

Upvotes: 0

Ravi Kiran
Ravi Kiran

Reputation: 585

In a very large table for me,


takes 37 seconds whereas


takes 4 seconds.

Upvotes: 3

Justus Swanevelder
Justus Swanevelder

Reputation: 1

Maybe a bit late but this might help others for MSSQL

MAX(RowNumber) FROM RecordCount

Upvotes: -3


Reputation: 1141

select rows from sysindexes
where id = Object_ID('TableName') and indid <2

Upvotes: 1


Reputation: 1

In SQL server 2016, I can just check table properties and then select 'Storage' tab - this gives me row count, disk space used by the table, index space used etc.

Upvotes: -2


Reputation: 24009

With PostgreSQL:

SELECT reltuples AS approximate_row_count FROM pg_class WHERE relname = 'table_name'

Upvotes: 0


Reputation: 2511

I got this script from another StackOverflow question/answer:

SELECT SUM(p.rows) FROM sys.partitions AS p
  INNER JOIN sys.tables AS t
  ON p.[object_id] = t.[object_id]
  INNER JOIN sys.schemas AS s
  ON s.[schema_id] = t.[schema_id]
  WHERE t.name = N'YourTableNameHere'
  AND s.name = N'dbo'
  AND p.index_id IN (0,1);

My table has 500 million records and the above returns in less than 1ms. Meanwhile,


takes a full 39 minutes, 52 seconds!

They yield the exact same number of rows (in my case, exactly 519326012).

I do not know if that would always be the case.

Upvotes: 35


Reputation: 6478

I found this good article SQL Server–HOW-TO: quickly retrieve accurate row count for table from martijnh1 which gives a good recap for each scenarios.

I need this to be expanded where I need to provide a count based on a specific condition and when I figure this part, I'll update this answer further.

In the meantime, here are the details from article:

Method 1:


SELECT COUNT(*) FROM Transactions 


Performs a full table scan. Slow on large tables.

Method 2:


SELECT CONVERT(bigint, rows) 
FROM sysindexes 
WHERE id = OBJECT_ID('Transactions') 
AND indid < 2 


Fast way to retrieve row count. Depends on statistics and is inaccurate.

Run DBCC UPDATEUSAGE(Database) WITH COUNT_ROWS, which can take significant time for large tables.

Method 3:


SELECT CAST(p.rows AS float) 
FROM sys.tables AS tbl 
INNER JOIN sys.indexes AS idx ON idx.object_id = tbl.object_id and
idx.index_id < 2 
INNER JOIN sys.partitions AS p ON p.object_id=CAST(tbl.object_id AS int) 
AND p.index_id=idx.index_id 
WHERE ((tbl.name=N'Transactions' 
AND SCHEMA_NAME(tbl.schema_id)='dbo')) 


The way the SQL management studio counts rows (look at table properties, storage, row count). Very fast, but still an approximate number of rows.

Method 4:


SELECT SUM (row_count) 
FROM sys.dm_db_partition_stats 
WHERE object_id=OBJECT_ID('Transactions')    
AND (index_id=0 or index_id=1); 


Quick (although not as fast as method 2) operation and equally important, reliable.

Upvotes: 14


Reputation: 432667

Simple answer:

  • Database vendor independent solution = use the standard = COUNT(*)
  • There are approximate SQL Server solutions but don't use COUNT(*) = out of scope


COUNT(1) = COUNT(*) = COUNT(PrimaryKey) just in case


SQL Server example (1.4 billion rows, 12 columns)

-- NOLOCK here is for me only to let me test for this answer: no more, no less

1 runs, 5:46 minutes, count = 1,401,659,700

--Note, sp_spaceused uses this DMV
   Total_Rows= SUM(st.row_count)
   sys.dm_db_partition_stats st
    object_name(object_id) = 'MyBigtable' AND (index_id < 2)

2 runs, both under 1 second, count = 1,401,659,670

The second one has less rows = wrong. Would be the same or more depending on writes (deletes are done out of hours here)

Upvotes: 296

Abhishek B Patel
Abhishek B Patel

Reputation: 935

For Sql server try this

SELECT T.name, 
       I.rows AS [ROWCOUNT] 
FROM   sys.tables AS T 
       INNER JOIN sys.sysindexes AS I 
               ON T.object_id = I.id AND I.indid < 2 
WHERE T.name = 'Your_Table_Name'

Upvotes: 1


Reputation: 2106

I am late to this question, but here is what you can do with MySQL (as I use MySQL). I am sharing my observations here:


Row Count: 508534
Console output: Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 0.125 sec.
Takes a while for a table with large number of rows, but the row count is very exact.


Row count: 511235
Console output: Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 0.250 sec Summary: Row count is not exact.

3) SELECT * FROM information_schema.tables WHERE table_schema = DATABASE();

Row count: 507806
Console output: Affected rows: 0 Found rows: 48 Warnings: 0 Duration for 1 query: 1.701 sec.
Row count is not exact.

I am not a MySQL or database expert, but I have found that for very large tables, you can use option 2 or 3 and get a 'fair idea' of how many rows are present.

I needed to get these row counts for displaying some stats on the UI. With the above queries, I knew that the total rows were more than 500,000, so I came up with showing stats like "More than 500,000 rows" without showing exact number of rows.

Maybe I have not really answered the OP's question, but I am sharing what I did in a situation where such statistics were needed. In my case, showing the approximate rows was acceptable and so the above worked for me.

Upvotes: 3


Reputation: 21357

If insert trigger is too expensive to use, but a delete trigger could be afforded, and there is an auto-increment id, then after counting entire table once, and remembering the count as last-count and the last-counted-id,

then each day just need to count for id > last-counted-id, add that to last-count, and store the new last-counted-id.

The delete trigger would decrement last-count, if id of deleted record <= last-counted-id.

Upvotes: 2

Kaliyug Antagonist
Kaliyug Antagonist

Reputation: 3612

Well, late by 5 years and unsure if it helps :

I was trying to count the no. of rows in a SQL Server table using MS SQL Server Management Studio and ran into some overflow error, then I used the below :

select count_big(1) FROM [dbname].[dbo].[FactSampleValue];

The result :

24296650578 rows

Upvotes: 5



If you are using Oracle, how about this (assuming the table stats are updated):

select <TABLE_NAME>, num_rows, last_analyzed from user_tables

last_analyzed will show the time when stats were last gathered.

Upvotes: 0


Reputation: 21

If you have a typical table structure with an auto-incrementing primary key column in which rows are never deleted, the following will be the fastest way to determine the record count and should work similarly across most ANSI compliant databases:

SELECT TOP(1) <primarykeyfield> FROM <table> ORDER BY <primarykeyfield> DESC;

I work with MS SQL tables containing billions of rows that require sub-second response times for data, including record counts. A similar SELECT COUNT(*) would take minutes to process by comparison.

Upvotes: 1

Yangmun Choi
Yangmun Choi

Reputation: 21

A literally insane answer, but if you have some kind of replication system set up (for a system with a billion rows, I hope you do), you can use a rough-estimator (like MAX(pk)), divide that value by the number of slaves you have, run several queries in parallel.

For the most part, you'd partition the queries across slaves based on the best key (or the primary key I guess), in such a way (we're going to use 250000000 as our Rows / Slaves):

-- First slave
SELECT COUNT(pk) FROM t WHERE pk < 250000000
-- Ith slave where 2 <= I <= N - 1
SELECT COUNT(pk) FROM t WHERE pk >= I*250000000 and pk < (I+1)*250000000
-- Last slave
SELECT COUNT(pk) FROM t WHERE pk > (N-1)*250000000

But you need SQL only. What a bust. Ok, so let's say you're a sadomasochist. On the master (or closest slave) you'd most likely need to create a table for this:

CREATE TABLE counter_table (minpk integer, maxpk integer, cnt integer, slaveid integer)

So instead of only having the selects running in your slaves, you'd have to do an insert, akin to this:

INSERT INTO counter_table VALUES (I*25000000, (I+1)*250000000, (SELECT COUNT(pk) FROM ... ), @@SLAVE_ID)

You may run into issues with slaves writing to a table on master. You may need to get even more sadis- I mean, creative:

-- A table per slave!
INSERT INTO counter_table_slave_I VALUES (...)

You should in the end have a slave that exists last in the path traversed by the replication graph, relative to the first slave. That slave should now have all other counter values, and should have its own values. But by the time you've finished, there probably are rows added, so you'd have to insert another one compensating for the recorded max pk in your counter_table and the current max pk.

At that point, you'd have to do an aggregate function to figure out what the total rows are, but that's easier since you'd be running it on at most the "number of slaves you have and change" rows.

If you're in the situation where you have separate tables in the slaves, you can UNION to get all the rows you need.

    SELECT * FROM counter_table_slave_1
    SELECT * FROM counter_table_slave_2

Or you know, be a bit less insane and migrate your data to a distributed processing system, or maybe use a Data Warehousing solution (which will give you awesome data crunching in the future too).

Do note, this does depend on how well your replication is set up. Since the primary bottleneck will most likely be persistent storage, if you have cruddy storage or poorly segregated data stores with heavy neighbor noise, this will probably run you slower than just waiting for a single SELECT COUNT(*) ...

But if you have good replication, then your speed gains should be directly related to the number or slaves. In fact, if it takes 10 minutes to run the counting query alone, and you have 8 slaves, you'd cut your time to less than a couple minutes. Maybe an hour to iron out the details of this solution.

Of course, you'd never really get an amazingly accurate answer since this distributed solving introduces a bit of time where rows can be deleted and inserted, but you can try to get a distributed lock of rows at the same instance and get a precise count of the rows in the table for a particular moment in time.

Actually, this seems impossible, since you're basically stuck with an SQL-only solution, and I don't think you're provided a mechanism to run a sharded and locked query across multiple slaves, instantly. Maybe if you had control of the replication log file... which means you'd literally be spinning up slaves for this purpose, which is no doubt slower than just running the count query on a single machine anyway.

So there's my two 2013 pennies.

Upvotes: 2


Reputation: 28882

Put an index on some column. That should allow the optimizer to perform a full scan of the index blocks, instead of a full scan of the table. That will cut your IO costs way down. Look at the execution plan before and after. Then measure wall clock time both ways.

Upvotes: 0

john rains
john rains

Reputation: 391

I'm nowhere near as expert as others who have answered but I was having an issue with a procedure I was using to select a random row from a table (not overly relevant) but I needed to know the number of rows in my reference table to calculate the random index. Using the traditional Count(*) or Count(1) work but I was occasionally getting up to 2 seconds for my query to run. So instead (for my table named 'tbl_HighOrder') I am using:

Declare @max int

Select @max = Row_Count
From sys.dm_db_partition_stats
Where Object_Name(Object_Id) = 'tbl_HighOrder'

It works great and query times in Management Studio are zero.

Upvotes: 5

Branko Dimitrijevic
Branko Dimitrijevic

Reputation: 52157

Not exactly a DBMS-agnostic solution, but at least your client code won't see the difference...

Create another table T with just one row and one integer field N1, and create INSERT TRIGGER that just executes:


Also create a DELETE TRIGGER that executes:


A DBMS worth its salt will guarantee the atomicity of the operations above2, and N will contain the accurate count of rows at all times, which is then super-quick to get by simply:


While triggers are DBMS-specific, selecting from T isn't and your client code won't need to change for each supported DBMS.

However, this can have some scalability issues if the table is INSERT or DELETE-intensive, especially if you don't COMMIT immediately after INSERT/DELETE.

1 These names are just placeholders - use something more meaningful in production.

2 I.e. N cannot be changed by a concurrent transaction between reading and writing to N, as long as both reading and writing are done in a single SQL statement.

Upvotes: 4


Reputation: 573

The fastest way by far on MySQL is:


You will instantly get all your tables with the row count (which is the total) along with plenty of extra information if you want.

Upvotes: 35

Mainsh S
Mainsh S

Reputation: 51

I use

select /*+ parallel(a) */  count(1) from table_name a;

Upvotes: 5

Jesse Webb
Jesse Webb

Reputation: 45303

Is there a better way to get the EXACT count of the number of rows of a table?

To answer your question simply, No.

If you need a DBMS independent way of doing this, the fastest way will always be:


Some DBMS vendors may have quicker ways which will work for their systems only. Some of these options are already posted in other answers.

COUNT(*) should be optimized by the DBMS (at least any PROD worthy DB) anyway, so don't try to bypass their optimizations.

On a side note:
I am sure many of your other queries also take a long time to finish because of your table size. Any performance concerns should probably be addressed by thinking about your schema design with speed in mind. I realize you said that it is not an option to change but it might turn out that 10+ minute queries aren't an option either. 3rd NF is not always the best approach when you need speed, and sometimes data can be partitioned in several tables if the records don't have to be stored together. Something to think about...

Upvotes: 12

Mike Woodhouse
Mike Woodhouse

Reputation: 52326

I don't think there is a general always-fastest solution: some RDBMS/versions have a specific optimization for SELECT COUNT(*) that use faster options while others simply table-scan. You'd need to go to the documentation/support sites for the second set, which will probably need some more specific query to be written, usually one that hits an index in some way.


Here's a thought that might work, depending on your schema and distribution of data: do you have an indexed column that references an increasing value, a numeric increasing ID, say, or even a timestamp or date? Then, assuming deletes don't happen, it should be possible to store the count up to some recent value (yesterday's date, highest ID value at some recent sample point) and add the count beyond that, which should resolve very quickly in the index. Very dependent on values and indices, of course, but applicable to pretty much any version of any DBMS.

Upvotes: 3

Alireza Maddah
Alireza Maddah

Reputation: 5885

If SQL Server edition is 2005/2008, you can use DMVs to calculate the row count in a table:

-- Shows all user tables and row counts for the current database 
-- Remove is_ms_shipped = 0 check to include system objects 
-- i.index_id < 2 indicates clustered index (1) or hash table (0) 
SELECT o.name, 
FROM sys.indexes AS i 
 INNER JOIN sys.objects AS o ON i.OBJECT_ID = o.OBJECT_ID 
 INNER JOIN sys.dm_db_partition_stats AS ddps ON i.OBJECT_ID = ddps.OBJECT_ID 
 AND i.index_id = ddps.index_id 
WHERE i.index_id < 2 
 AND o.is_ms_shipped = 0 

For SQL Server 2000 database engine, sysindexes will work, but it is strongly advised to avoid using it in future editions of SQL Server as it may be removed in the near future.

Sample code taken from: How To Get Table Row Counts Quickly And Painlessly

Upvotes: 6

Denis de Bernardy
Denis de Bernardy

Reputation: 78561

I have come across articles that state that SELECT COUNT(*) FROM TABLE_NAME will be slow when the table has lots of rows and lots of columns.

That depends on the database. Some speed up counts, for instance by keeping track of whether rows are live or dead in the index, allowing for an index only scan to extract the number of rows. Others do not, and consequently require visiting the whole table and counting live rows one by one. Either will be slow for a huge table.

Note that you can generally extract a good estimate by using query optimization tools, table statistics, etc. In the case of PostgreSQL, for instance, you could parse the output of explain count(*) from yourtable and get a reasonably good estimate of the number of rows. Which brings me to your second question.

I have a table that might contain even billions of rows [it has approximately 15 columns]. Is there a better way to get the EXACT count of the number of rows of a table?

Seriously? :-) You really mean the exact count from a table with billions of rows? Are you really sure? :-)

If you really do, you could keep a trace of the total using triggers, but mind concurrency and deadlocks if you do.

Upvotes: 10


Reputation: 22086

You can try this sp_spaceused (Transact-SQL)

Displays the number of rows, disk space reserved, and disk space used by a table, indexed view, or Service Broker queue in the current database, or displays the disk space reserved and used by the whole database.

Upvotes: 12

Related Questions