Reputation: 25188
When you limit the number of rows to be returned by a SQL query, usually used in paging, there are two methods to determine the total number of records:
Include the SQL_CALC_FOUND_ROWS
option in the original SELECT
, and then get the total number of rows by running SELECT FOUND_ROWS()
:
SELECT SQL_CALC_FOUND_ROWS * FROM table WHERE id > 100 LIMIT 10;
SELECT FOUND_ROWS();
Run the query normally, and then get the total number of rows by running SELECT COUNT(*)
SELECT * FROM table WHERE id > 100 LIMIT 10;
SELECT COUNT(*) FROM table WHERE id > 100;
Which method is the best / fastest?
Upvotes: 197
Views: 166082
Reputation: 28834
MySQL has started deprecating SQL_CALC_FOUND_ROWS
functionality with version 8.0.17 onwards.
So, it is always preferred to consider executing your query with LIMIT
, and then a second query with COUNT(*)
and without LIMIT
to determine whether there are additional rows.
From docs:
The SQL_CALC_FOUND_ROWS query modifier and accompanying FOUND_ROWS() function are deprecated as of MySQL 8.0.17 and will be removed in a future MySQL version.
COUNT(*) is subject to certain optimizations. SQL_CALC_FOUND_ROWS causes some optimizations to be disabled.
Use these queries instead:
SELECT * FROM tbl_name WHERE id > 100 LIMIT 10; SELECT COUNT(*) WHERE id > 100;
Also, SQL_CALC_FOUND_ROWS
has been observed to have more issues generally, as explained in the MySQL WL# 12615 :
SQL_CALC_FOUND_ROWS has a number of problems. First of all, it's slow. Frequently, it would be cheaper to run the query with LIMIT and then a separate SELECT COUNT() for the same query, since COUNT() can make use of optimizations that can't be done when searching for the entire result set (e.g. filesort can be skipped for COUNT(*), whereas with CALC_FOUND_ROWS, we must disable some filesort optimizations to guarantee the right result)
More importantly, it has very unclear semantics in a number of situations. In particular, when a query has multiple query blocks (e.g. with UNION), there's simply no way to calculate the number of “would-have-been” rows at the same time as producing a valid query. As the iterator executor is progressing towards these kinds of queries, it is genuinely difficult to try to retain the same semantics. Furthermore, if there are multiple LIMITs in the query (e.g. for derived tables), it's not necessarily clear to which of them SQL_CALC_FOUND_ROWS should refer to. Thus, such nontrivial queries will necessarily get different semantics in the iterator executor compared to what they had before.
Finally, most of the use cases where SQL_CALC_FOUND_ROWS would seem useful should simply be solved by other mechanisms than LIMIT/OFFSET. E.g., a phone book should be paginated by letter (both in terms of UX and in terms of index use), not by record number. Discussions are increasingly infinite-scroll ordered by date (again allowing index use), not by paginated by post number. And so on.
Upvotes: 36
Reputation: 63
Simple example on table with 2.000.000 rows and query like this :
select fieldname
from table_add
where
descryption_per like '%marihuana%'
or addiction_per like '%alkohol%';
it is a full table scan every query - so it take time x 2. I mean "select count(*) from .....
Upvotes: 0
Reputation: 4732
It depends. See the MySQL Performance Blog post on this subject: To SQL_CALC_FOUND_ROWS
or not to SQL_CALC_FOUND_ROWS
?
Just a quick summary: Peter says that it depends on your indexes and other factors. Many of the comments to the post seem to say that SQL_CALC_FOUND_ROWS
is almost always slower - sometimes up to 10x slower - than running two queries.
Upvotes: 132
Reputation: 2958
There are other options for you to benchmark:
1.) A window function will return the actual size directly (tested in MariaDB):
SELECT
`mytable`.*,
COUNT(*) OVER() AS `total_count`
FROM `mytable`
ORDER BY `mycol`
LIMIT 10, 20
2.) Thinking out of the box, most of the time users don't need to know the EXACT size of the table, an approximate is often good enough.
SELECT `TABLE_ROWS` AS `rows_approx`
FROM `INFORMATION_SCHEMA`.`TABLES`
WHERE `TABLE_SCHEMA` = DATABASE()
AND `TABLE_TYPE` = "BASE TABLE"
AND `TABLE_NAME` = ?
Upvotes: 4
Reputation: 2277
Removing some unnecessary SQL and then COUNT(*)
will be faster than SQL_CALC_FOUND_ROWS
. Example:
SELECT Person.Id, Person.Name, Job.Description, Card.Number
FROM Person
JOIN Job ON Job.Id = Person.Job_Id
LEFT JOIN Card ON Card.Person_Id = Person.Id
WHERE Job.Name = 'WEB Developer'
ORDER BY Person.Name
Then count without unnecessary part:
SELECT COUNT(*)
FROM Person
JOIN Job ON Job.Id = Person.Job_Id
WHERE Job.Name = 'WEB Developer'
Upvotes: 6
Reputation: 18693
According to the following article: https://www.percona.com/blog/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
If you have an INDEX on your where clause (if id is indexed in your case), then it is better not to use SQL_CALC_FOUND_ROWS and use 2 queries instead, but if you don't have an index on what you put in your where clause (id in your case) then using SQL_CALC_FOUND_ROWS is more efficient.
Upvotes: 17
Reputation: 1769
IMHO, the reason why 2 queries
SELECT * FROM count_test WHERE b = 666 ORDER BY c LIMIT 5;
SELECT count(*) FROM count_test WHERE b = 666;
are faster than using SQL_CALC_FOUND_ROWS
SELECT SQL_CALC_FOUND_ROWS * FROM count_test WHERE b = 555 ORDER BY c LIMIT 5;
has to be seen as a particular case.
It in facts depends on the selectivity of the WHERE clause compared to the selectivity of the implicit one equivalent to the ORDER + LIMIT.
As Arvids told in comment (http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/#comment-1174394), the fact that the EXPLAIN use, or not, a temporay table, should be a good base for knowing if SCFR will be faster or not.
But, as I added (http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/#comment-8166482), the result really, really depends on the case. For a particular paginator, you could get to the conclusion that “for the 3 first pages, use 2 queries; for the following pages, use a SCFR” !
Upvotes: 10
Reputation: 942
When choosing the "best" approach, a more important consideration than speed might be the maintainability and correctness of your code. If so, SQL_CALC_FOUND_ROWS is preferable because you only need to maintain a single query. Using a single query completely precludes the possibility of a subtle difference between the main and count queries, which may lead to an inaccurate COUNT.
Upvotes: 23