Reputation: 230038
I understand the best way to count the number of rows in an SQL table is count(*) (or equivalently count(PrimaryKey)).
Why not just implement a counter and return it for this specific query? Is it because this query is not a common use case?
If the answers vary according to SQL engine, I'd like to hear the differences - but in any case, I'm interested in the actual implementation in production SQL engines.
Upvotes: 10
Views: 6826
Reputation: 39946
In MS SQL server, doing a Count(*) on a table always does an index scan (on primary key) or a table scan (both bad). For large tables, this can take awhile.
Instead, there is a cool trick to show the current number of records nearly instantly (same one that Microsoft uses when you right click on the table and select properties):
--SQL 2005 or 2008
select sum (spart.rows)
from sys.partitions spart
where spart.object_id = object_id('YourTable')
and spart.index_id < 2
--SQL 2000
select max(ROWS) from sysindexes
where id = object_id('Resolve_Audit')
This number may be slightly off depending on how often SQL updates the index statistics, but if you need a ballpark, and not an exact number, these work great.
Upvotes: 4
Reputation: 63538
It's not constant time, because in transactional engines it needs to check how many rows exist in the current transaction, which generally involves a whole table scan.
Optimising COUNT(*) with no where clause is not a particularly useful optimisation for a database to do at the expense of other things; users of large tables rarely do such a query and it would not help at all if there was a WHERE clause present.
MyISAM in MySQL does "cheat" by storing the exact row count, but it can only do this because it doesn't have MVCC therefore doesn't need to worry about which transactions the rows are in.
Upvotes: 2
Reputation: 35459
Apparently O(N) on PostgreSQL:
=> explain select count(*) from tests;
QUERY PLAN
---------------------------------------------------------------------
Aggregate (cost=37457.88..37457.89 rows=1 width=0)
-> Seq Scan on tests (cost=0.00..33598.30 rows=1543830 width=0)
(2 rows)
(Seq Scan means it has to scan the whole table)
Upvotes: 0
Reputation: 753795
With Informix, in the absence of complicating factors such as LBAC (Label-Based Access Control), then SELECT COUNT(*) FROM SomeTable
is O(1); it pulls the information from the control information it maintains. If there's a WHERE clause or LBAC or the table is a view or any of a number of other factors, then it ceases to be O(1).
Upvotes: 0
Reputation: 52346
The performance of a COUNT(*) based on an index or on a table really depends on the segment size. You can have a 1GB table that only has a single row in it, but Oracle might still have to scan the entire allocated space. Inserting another million rows might not affect performance at all if it does not alter the high-water mark. Indexes work in a similar manner, where different patterns of deletes can leave different amounts of free space in the index structure and cause index scans to give better or worse performance than O(N).
So, theoretically it's O(N). In practice there are iplementation issues that can cause it to be very different.
For example there are some cases where an Oracle data warehouse might give better than O(N) performance. In particular the optimizer could scan a bitmap index, and the size of a bitmap index is only weakly related to the size of the table, unlike a b-tree index. This is because of the compression methodology which makes the index size dependent on the size of the table, the number of unique values, the distribution of values throughout the table, and the historical loading pattern as well I believe. So, doubling the number of rows in the table might only increase the size of the index by 10%.
In the presence of a materialized view you might also get O(1) by reading a summary table (a trigger is an unsafe way of doing this).
Upvotes: 3
Reputation: 103505
A database could store the number of rows in a table and respond O(1) to
select count(*) From MyTable
But, really, what good would that do them? Any variation from that (say select count(*) from MyTable where Category = 5
) would require a full table scan (or index scan) and would be O(N).
Upvotes: 0
Reputation: 421988
It's normally O(N).
If O(1) response to such query is needed, you can easily accomplish it either using:
Example:
CREATE TABLE CountingTable ( Count int )
INSERT CountingTable VALUES(0)
CREATE TRIGGER Counter ON Table FOR INSERT, UPDATE, DELETE AS
BEGIN
DECLARE @added int, @Removed int
SET @added = SELECT COUNT(*) FROM inserted
SET @removed = SELECT COUNT(*) FROM deleted
UPDATE CountingTable SET Count = Count + @added - @removed
END
Upvotes: 1
Reputation: 89661
There is a (inaccurate) shortcut in SQL Server where you can look at the count in the metadata sys.partitions for a particular object like an index on a table.
The operation is O(1), but is only an estimate.
Upvotes: 0
Reputation: 13118
In some RDBM's this is O(1) (most notably MySQL), put AFAIK it is generally frowned upon and considered an "ugly performance hack". The reasons is that if you have transactions (which every real RDBM should have), the total number of rows in the table might or might not be equal to the total number you can see from the current transaction. This is why the server needs to check which rows are actually visible to your transaction, making it more O(n) than O(1).
If you want to optimize the process of getting the number of rows and are satisfied with an approximate result, most RDBM's have special "info" tables which hold information about the tables, including the approximate number of rows (again, it is not the exact number of rows because of the transactions).
Upvotes: 9
Reputation: 27478
No this is not a common use case. Most row counts I have seen have some where clause involved.
The main reason this is not implemented though is the row counter would be a cause of contention in a multi user environment. Every time a row was inserted or deleted the counter would need updating effectively locking the whole table for each insert/delete.
Upvotes: 8
Reputation: 28837
For Oracle it is at generally O(N) unless the query result is in cache, as it essentially has to iterate all the blocks, or iterate the index to count them.
Upvotes: 1