Reputation: 11026

Performance of a duplicate agregate function in a select query

I have a SQL query (in MySQL) that select a total and a count of how many completed task have the table:

SELECT
  count(*) as total, IF(SUM(NOT `completed`) IS NULL,0,SUM(NOT `completed`)) as incomplete
FROM
  tasks

The table can be made something like:

CREATE TABLE `tasks` (
  `clave` int(11) NOT NULL AUTO_INCREMENT,
  `completed` tinyint(1) NOT NULL DEFAULT '0' COMMENT 'If it is 0 isn\'t completed, otherwise is completed',
  PRIMARY KEY (`clave`)
) ENGINE=InnoDB;

You can observe that in the query I have used SUM(NOT completed) two times, one to check if it will return null (and return 0 instead) and one to return the count (when it isn't null). A SUM() function return null if the table (or query) is empty, namely the table have not row at all.

As you are trying to SUM two times, I think that MySQL will calculate the sum two times.

I have tested to try if I can set an alias for the column and then use it in a IF so mysql don't need to recalculate it.

And as a requirement, incomplete column must not be null in any circumstance (for example if the table tasks is empty).

And my question, is this efficient? Will MySQL need to recalculate the sum each time or it remember it?

Upvotes: 3

Answers (5)

John Ruddell

Reputation: 25862

this may be a better way to do it because you do less calculations for performance

using the schema richard posted - see FIDDLE

SELECT 
    total, 
    num_complete, 
    total - num_complete as num_incomplete
FROM(
    SELECT
        COUNT(*) as total,
        SUM(IF(t.completed > 0, 1, 0)) as num_complete
    FROM status_log t
) as t

checking for empty table per OP request

SELECT 
    total, 
    num_complete, 
    total - num_complete as num_incomplete
FROM(
    SELECT
        COUNT(*) as total,
        IF(COUNT(*) > 0, SUM(IF(t.completed > 0, 1, 0)), 0) as num_complete
    FROM status_log t
) as t

you should do your check for null or empty table in another programming language ... generally speaking SQL should be used to query from the table.. not when it is empty though... if it is completely empty then you should check for a null response when running this query in another programming language. that will improve the performance of it a lot.

Upvotes: 4

Joseph B

Reputation: 5669

Thanks to John Ruddell and PhoneixS for clarifications. Here's a version using CASE WHEN:

select 
  count(*),
  sum(
   case completed
    when 1 then 1 
    else 0
   end) 'completed',
  sum(
   case when completed is null then 1 
    else 0
   end) 'incomplete'
from tasks;

SQL Fiddle

Upvotes: -1

Joseph B

Reputation: 5669

If you make it into an inline view, you can select multiple times from the query, as below:

SELECT total, incomplete, incomplete
FROM
(
SELECT
  count(*) as total, IF(SUM(NOT `completed`) IS NULL,0,SUM(NOT `completed`)) as incomplete
FROM
  tasks
) incomplete_count;

If you selected from a sub-query or an inline view multiple times, then MySQL does not have to re-calculate it. Also, you can perform any additional operations / filters in the outer query.

Upvotes: 0

Richard Pascual

Reputation: 2021

Here is a solution using a CASE statement. You can get both aggregate values from a single SQL statement. The second query example shows how the CASE statement can get you a pivoted output compared to a direct aggregation query statement:

[SQL Fiddle][1]

MySQL 5.5.32 Schema Setup:

CREATE TABLE status_log 
    (
     id int auto_increment primary key, 
     type varchar(20), 
     status varchar(30)
    );

INSERT INTO status_log
(type, status)
VALUES
('Alpha', 'COMPLETE'),
('Bravo', 'INCOMPLETE'),
('Charlie', 'INCOMPLETE'),
('Delta', 'COMPLETE'),
('Echo', 'COMPLETE'),
('Foxtrot', 'INCOMPLETE'),
('Golf', 'COMPLETE'),
('Hotel', 'COMPLETE')

Query 1:

SELECT count(1) as count_by_status, status
  FROM status_log
 GROUP BY status

Results:

| COUNT_BY_STATUS |     STATUS |
|-----------------|------------|
|               5 |   COMPLETE |
|               3 | INCOMPLETE |

Query 2:

SELECT count(*) as total_count, 
  sum(case when status = 'COMPLETE' then 1
           else 0 end) as completed_count,
  sum(case when status = 'INCOMPLETE' then 1
           else 0 end) as incomplete_count
  FROM status_log

Results:

| TOTAL_COUNT | COMPLETED_COUNT | INCOMPLETE_COUNT |
|-------------|-----------------|------------------|
|           8 |               5 |                3 |

Upvotes: 2

Evan Volgas

Reputation: 2911

The efficiency (or lack thereof) will have more to do with the indices present on the table, as well as the storage engine itself. Likewise, whether or not the result is stored in cache will have more to do with the storage engine than with the statement you are writing.

If I were writing this on an INNODB-based storage engine, I would do the following:

 SELECT   
       count(*) as total,    
       SUM(CASE WHEN completed = 0 OR completed IS NULL THEN  1 ELSE 0 END) AS incomplete 
 FROM tasks;

And I would index my "completed" column in order to do this.

The reason I would change from "IF" to case-when is largely just do the code-portability. CASE WHEN will be easier to move to other databases, if need be.

Also, an index on completed will allow this query to simply evaluate the index, the not the table value itself. That, with LRU, should give you plenty in the way of efficiency.

Upvotes: 2

Performance of a duplicate agregate function in a select query

Answers (5)

Related Questions