pillarOfLight
pillarOfLight

Reputation: 8982

select rows as well as a total count in one query in mysql

suppose I have a table t and table t has 15000 entries

suppose the query

SELECT * FROM t WHERE t.nid <1000

returns 1000 rows

but then I only want the first 10 rows so I do a LIMIT

SELECT * FROM t WHERE t.nid <1000 LIMIT 10

is it possible to construct a single query in which in addition to returning the 10 rows information with the LIMIT clause above, it also returns the total count of the rows that satisfy the conditions set in the WHERE clause, hence in addition to returning the 10 rows above, it also returns 1000 since there are a total of 1000 rows satisfying the WHERE clause...and have both returned in a single query

Upvotes: 7

Views: 3866

Answers (6)

LSerni
LSerni

Reputation: 57408

Preferred solution

First of all, the found_rows() function is not portable (it is a MySQL extension) and is going to be removed. As user @Zveddochka pointed out, it has already been deprecated in MySQL 8.0.17.

But more importantly, it turns out that if you use proper indexing, then running two queries is actually faster. The SQL_CALC_FOUND_ROWS directive is achieved through a "virtual scan" that incurs an additional recovery cost. When the query is not indexed, then this cost would be the same of a COUNT(), and therefore running two queries will cost double - i.e., using SQL_CALC_FOUND_ROWS will make things run 50% faster.

But what happens when the query is properly indexed? The guys at Percona checked it out. And it turns out that not only the COUNT() is blazing fast since it only accesses metadata and indexes, and the query without SQL_CALC_FOUND_ROWS is faster because it doesn't incur any additional cost; the cost of the two queries combined is less than the cost of the enhanced single query:

Results with SQL_CALC_FOUND_ROWS are following: for each b value it takes 20-100 sec to execute uncached and 2-5 sec after warmup. Such difference could be explained by the I/O which required for this query – mysql accesses all 10k rows this query could produce without LIMIT clause.

The results are following: it takes 0.01-0.11 sec to run this query first time and 0.00-0.02 sec for all consecutive runs.

So, as we can see, total time for SELECT+COUNT (0.00-0.15 sec) is much less than execution time for original query (2-100 sec). Let’s take a look at EXPLAINs...

So, what to do?

// Run two queries ensuring they satisfy exactly the same conditions

$field1 = "Field1, Field2, blah blah blah";
$field2 = "COUNT(*) AS rows";

$where  = "Field5 = 'X' AND Field6 = 'Y' AND blah blah";

$cntQuery = "SELECT {$field2} FROM {$joins} WHERE {$where}";
$rowQuery = "SELECT {$field1} FROM {$joins} WHERE {$where} LIMIT {$limit}";

Now the first query returns the count, the second query returns the actual data.

Old answer (useful just for non-indexed tables)

Don't do this. If you find out this section of the answer works for you better than the section above, it's almost certainly a signal that something else is not optimal in your setup - most likely you're not using the indexes properly, or you need to update your MySQL server, or run an analyze/optimize of the database to update cardinality statistics.

You can, but I think it would be a performance killer.

Your best option would be to use the SQL_CALC_FOUND_ROWS MySQL extension and issue a second query to recover the full number of rows using FOUND_ROWS().

 SELECT SQL_CALC_FOUND_ROWS * FROM t WHERE t.nid <1000 LIMIT 10;
 SELECT FOUND_ROWS();

See e.g http://www.arraystudio.com/as-workshop/mysql-get-total-number-of-rows-when-using-limit.html

Or you could simply run the full query without LIMIT clause, and retrieve only the first ten rows. Then you can use one query as you wanted, and also get the row count through mysql_num_rows(). This is not ideal, but also not so catastrophic for most queries.

If you do this last, though, be very careful to close the query and free its resources: I have found out that retrieving less than the full resultset and forgetting to free the rs handle is one outstanding cause of "metadata locking".

Upvotes: 9

Nairi Abgaryan
Nairi Abgaryan

Reputation: 666

Count(*) time complexity is O(1), so you can use a subquery

SELECT *, (SELECT COUNT(*) FROM t WHERE t.nid <1000) AS cnt
FROM t 
WHERE t.nid <1000 
LIMIT 10

Upvotes: 0

Rick James
Rick James

Reputation: 142298

There are many things that need discussing.

  • A LIMIT without an ORDER BY is somewhat unpredictable, hence somewhat meaningless.

  • But if you add an ORDER BY, it may need to find all the rows, sort them then deliver only the 10 you want.

  • Or, the ORDER BY may be handled adequately by an INDEX.

Your particular query, if turned into 2 queries (as needed after 8.0.17), would be

SELECT * FROM t WHERE t.nid < 1000 LIMIT 10;
SELECT COUNT(*) FROM t WHERE t.nid < 1000;

Note that each of those would benefit from INDEX(nid). The first would pick 10 items from the index's BTree, then look them up in the data's BTree -- only 10 rows touched in each. The second would scan the INDEX until it hits 1000, and not touch the data BTree.

If you add an ORDER BY as advised, then, the first query:

SELECT * FROM t WHERE t.nid < 1000 ORDER BY t.nid LIMIT 10;

will work identically as above. But

SELECT * FROM t WHERE t.nid < 1000 ORDER BY t.abcd LIMIT 10;

will need to scan lots of rows, and be quite slow. And probably use a temp table and filesort. (Check EXPLAIN for details.) INDEX(nid, abcd) would help, but only a little.

And there are other variants, such as when the index can be "covering".

What is the goal of having "one query"?

  • Speed? -- as discussed above, there are other factors that are more pertinent.

  • Consistency? -- You may need a transaction to avoid, for example, getting N rows from the first query and a smaller number from the COUNT.

     BEGIN;
     SELECT * ...
     SELECT COUNT(*) ...
     COMMIT;
    
  • Single command? -- Consider a stored procedure that combines the 2 statements. Or

     SELECT * FROM t WHERE t.nid < 1000 LIMIT 10
     UNION ALL
     SELECT COUNT(*) FROM t WHERE t.nid < 1000;
    

but that gets tricky because the number of columns is different, so some kludge would be needed to make the second query have the same number of columns. Another variant involves GROUP BY WITH ROLLUP. (But it may be even harder to fabricate.)

Lukasz's Answer looks promising. However, it gives an extra column (which might be good) and its performance needs to be tested. If you are on 8.0 and their answer works well for you, accept that Answer.

Upvotes: 2

Lukasz Szozda
Lukasz Szozda

Reputation: 175726

"is it possible to construct a single query in which in addition to returning the 10 rows information with the LIMIT clause above, it also returns the total count of the rows that satisfy the conditions set in the WHERE clause"

Yes, it is possible to do both in single query, by using windowed function i.e. COUNT(*) OVER()(MySQL 8.0+):

SELECT t.*, COUNT(*) OVER() AS cnt 
FROM t 
WHERE t.nid <1000 
LIMIT 10;  

db<>fiddle demo


Sidenote:

LIMIT without explicit ORDER BY is non-deterministic. It could return different results between multiple runs.

Upvotes: 4

David Grenier
David Grenier

Reputation: 1241

Sounds like you want FOUND_ROWS()

SELECT SQL_CALC_FOUND_ROWS * FROM t WHERE t.nid <1000 LIMIT 10;
SELECT FOUND_ROWS();

Upvotes: -1

charlee
charlee

Reputation: 1369

You can try SQL_CALC_FOUND_ROWS, which can get a count of total records without running the statement again.

SELECT SQL_CALC_FOUND_ROWS * FROM t WHERE t.nid <1000 LIMIT 10;   -- get records
SELECT FOUND_ROWS();                -- get count

Reference: http://dev.mysql.com/doc/refman/5.0/en/information-functions.html

Upvotes: 4

Related Questions