Matt McCormick
Matt McCormick

Reputation: 13200

How to do fast counting on large tables?

I have large MySQL tables with hundreds of thousands of rows.

I need to write a query on a customers table which gets the count of when customers will be available to contact again.

eg.

SELECT 'This week', COUNT(*) FROM customers 
WHERE sales_person_id = 1 AND DATEDIFF(NOW(), available_date) < 7

UNION

SELECT 'Next week', COUNT(*) FROM customers 
WHERE sales_person_id = 1 AND DATEDIFF(NOW(), available_date) >= 7 
    AND DATEDIFF(NOW(), available_date) < 14

UNION

... (a few more like this)

Having written a similar query on a different large table, I did notice that changing the engine from InnoDB to MyISAM sped up the query considerably (InnoDB is not needed for these tables as they do not have foreign key checks). Is there anything else I can do to speed up counts like this (other than indexing the appropriate fields)?

Upvotes: 4

Views: 400

Answers (3)

OMG Ponies
OMG Ponies

Reputation: 332661

  1. Never do in multiple queries, what you can do in one.

    If you create a derived table/inline view that has the necessary start & end dates, this can be used to produce the desired result in a single query using a GROUP BY. MySQL doesn't have a recursive function, so you have to use the NUMBERS table trick to generate dates...

    1. Create a table that only holds incrementing numbers - easy to do using an auto_increment:

      DROP TABLE IF EXISTS `example`.`numbers`;
      CREATE TABLE  `example`.`numbers` (
       `id` int(10) unsigned NOT NULL auto_increment,
        PRIMARY KEY  (`id`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
      
    2. Populate the table using:

      INSERT INTO NUMBERS (id)
      VALUES (NULL)
      

      ...for as many values as you need.

    3. Use DATE_ADD to construct a list of dates, increasing the days based on the NUMBERS.id value.

      SELECT x.start_dt,
             x.end_dt
        FROM (SELECT DATE_ADD(NOW(), INTERVAL n.id - 1 DAY) AS start_dt,
                     DATE_ADD(NOW(), INTERVAL n.id + 6 DAY) AS end_dt
                FROM `numbers` n
               WHERE DATE_ADD(NOW(), INTERVAL (n.id - 1) DAY) <= '2011-01-01') x
      
    4. JOIN onto your table of data based on the datetime portion:

        SELECT x.start_dt,
               x.end_dt,
               COUNT(*) AS num
          FROM (SELECT DATE_ADD(NOW(), INTERVAL n.id - 1 DAY) AS start_dt,
                       DATE_ADD(NOW(), INTERVAL n.id + 6 DAY) AS end_dt
                  FROM `numbers` n
                 WHERE DATE_ADD(NOW(), INTERVAL (n.id - 1) DAY) <= '2011-01-01') x
          JOIN CUSTOMERS c ON c.available_date BETWEEN x.start_dt
                                                   AND x.end_dt
      GROUP BY x.start_dt, x.end_dt
      
  2. Don't use functions performed upon actual column data - IE: DATEDIFF(NOW(), *available_date*) - because a database can't use an index (if one exists) on the available_date column because the data has been altered away from the index value.

Upvotes: 3

Damir Sudarevic
Damir Sudarevic

Reputation: 22187

Focus on the WHERE clause.

  • Is there index on fields in the WHERE clause?
  • Can you replace the datediff() function with a constant, it is being evaluated for every row.

Upvotes: 1

Mchl
Mchl

Reputation: 62395

WHERE sales_person_id = 1 AND available_date BETWEEN CURDATE() - INTERVAL 1 WEEK AND CURDATE()

Doing it this way should let MySQL use a composite index created on (sales_person_id, available_date) columns (use EXPLAIN to check)

Upvotes: 5

Related Questions