Jason
Jason

Reputation: 95

How can I optimize this mysql query to find maximum simultaneous calls?

I'm trying to calculate maximum simultaneous calls. My query, which I believe to be accurate, takes way too long given ~250,000 rows. The cdrs table looks like this:

+---------------+-----------------------+------+-----+---------+----------------+
| Field         | Type                  | Null | Key | Default | Extra          |
+---------------+-----------------------+------+-----+---------+----------------+
| id            | bigint(20) unsigned   | NO   | PRI | NULL    | auto_increment |
| CallType      | varchar(32)           | NO   |     | NULL    |                |
| StartTime     | datetime              | NO   | MUL | NULL    |                |
| StopTime      | datetime              | NO   |     | NULL    |                |
| CallDuration  | float(10,5)           | NO   |     | NULL    |                |
| BillDuration  | mediumint(8) unsigned | NO   |     | NULL    |                |
| CallMinimum   | tinyint(3) unsigned   | NO   |     | NULL    |                |
| CallIncrement | tinyint(3) unsigned   | NO   |     | NULL    |                |
| BasePrice     | float(12,9)           | NO   |     | NULL    |                |
| CallPrice     | float(12,9)           | NO   |     | NULL    |                |
| TransactionId | varchar(20)           | NO   |     | NULL    |                |
| CustomerIP    | varchar(15)           | NO   |     | NULL    |                |
| ANI           | varchar(20)           | NO   |     | NULL    |                |
| ANIState      | varchar(10)           | NO   |     | NULL    |                |
| DNIS          | varchar(20)           | NO   |     | NULL    |                |
| LRN           | varchar(20)           | NO   |     | NULL    |                |
| DNISState     | varchar(10)           | NO   |     | NULL    |                |
| DNISLATA      | varchar(10)           | NO   |     | NULL    |                |
| DNISOCN       | varchar(10)           | NO   |     | NULL    |                |
| OrigTier      | varchar(10)           | NO   |     | NULL    |                |
| TermRateDeck  | varchar(20)           | NO   |     | NULL    |                |
+---------------+-----------------------+------+-----+---------+----------------+

I have the following indexes:

+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name        | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| cdrs  |          0 | PRIMARY         |            1 | id          | A         |      269622 |     NULL | NULL   |      | BTREE      |         |               |
| cdrs  |          1 | id              |            1 | id          | A         |      269622 |     NULL | NULL   |      | BTREE      |         |               |
| cdrs  |          1 | call_time_index |            1 | StartTime   | A         |      269622 |     NULL | NULL   |      | BTREE      |         |               |
| cdrs  |          1 | call_time_index |            2 | StopTime    | A         |      269622 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

The query I am running is this:

SELECT MAX(cnt) AS max_channels FROM
  (SELECT cl1.StartTime, COUNT(*) AS cnt
    FROM cdrs cl1
    INNER JOIN cdrs cl2
    ON cl1.StartTime
    BETWEEN cl2.StartTime  AND cl2.StopTime
    GROUP BY cl1.id)
  AS counts;

It seems like I might have to chunk this data for each day and store the results in a separate table like simultaneous_calls.

Upvotes: 1

Views: 95

Answers (2)

spencer7593
spencer7593

Reputation: 108480

The inline view isn't strictly necessary. (You're right about a lot of time to run the EXPLAIN on the query with the inline view, the EXPLAIN will materialize the inline view (i.e. run the inline view query and populate the derived table), and then give an EXPLAIN on the outer query.

Note that this query will return an equivalent result:

SELECT COUNT(*) AS max_channels
  FROM cdrs cl1
  JOIN cdrs cl2
    ON cl1.StartTime BETWEEN cl2.StartTime  AND cl2.StopTime
 GROUP BY cl1.id
 ORDER BY max_channels DESC
 LIMIT 1

Though it still has to do all the work, and probably doesn't perform any better; the EXPLAIN should run a lot faster. (We expect to see "Using temporary; Using filesort" in the Extra column.)


The number of rows in the resultset is going to be the number of rows in the table (~250,000 rows), and those are going to need to be sorted, so that's going to be some time there. The bigger issue (my gut is telling me) is that join operation.

I'm wondering if the EXPLAIN (or performance) would be any different if you swapped the cl1 and cl2 in the predicate, i.e.

ON cl2.StartTime BETWEEN cl1.StartTime AND cl1.StopTime

I'm thinking that, just because I'd be tempted to try a correlated subquery. That's ~250,000 executions, and that's not likely going to be any faster...

SELECT ( SELECT COUNT(*) 
           FROM cdrs cl2
          WHERE cl2.StartTime BETWEEN cl1.StartTime AND cl1.StopTime
       ) AS max_channels
     , cl1.StartTime
  FROM cdrs cl1
 ORDER BY max_channels DESC
 LIMIT 11

You could run an EXPLAIN on that, we're still going to see a "Using temporary; Using filesort", and it will also show the "dependent subquery"...


Obviously, adding a predicate on the cl1 table to cut down the number of rows to be returned (for example, checking only the past 15 days); that should speed things up, but it doesn't get you the answer you want.

WHERE cl1.StartTime > NOW() - INTERVAL 15 DAY

(None of my musings here are sure-fire answers to your question, or solutions to the performance issue; they're just musings.)

Upvotes: 1

Bill Karwin
Bill Karwin

Reputation: 562711

I'm sure you want to know not only the maximum simultaneous calls, but when that happened.

I would create a table containing the timestamp of every individual minute

CREATE TABLE times (ts DATETIME UNSIGNED AUTO_INCREMENT PRIMARY KEY);
INSERT INTO times (ts) VALUES ('2014-05-14 00:00:00');
. . . until 1440 rows, one for each minute .  . .

Then join that to the calls.

SELECT ts, COUNT(*) AS count FROM times 
JOIN cdrs ON times.ts BETWEEN cdrs.starttime AND cdrs.stoptime 
GROUP BY ts ORDER BY count DESC LIMIT 1;

Here's the result in my test (MySQL 5.6.17 on a Linux VM running on a Macbook Pro):

+---------------------+----------+
| ts                  | count(*) |
+---------------------+----------+
| 2014-05-14 10:59:00 |     1001 |
+---------------------+----------+
1 row in set (1 min 3.90 sec)

This achieves several goals:

  • Reduces the number of rows examined by two orders of magnitude.
  • Reduces the execution time from 3 hours+ to about 1 minute.
  • Also returns the actual timestamp when the highest count was found.

Here's the EXPLAIN for my query:

explain select ts, count(*) from times join cdrs on times.ts between cdrs.starttime and cdrs.stoptime group by ts order by count(*) desc limit 1;
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra                                          |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
|  1 | SIMPLE      | times | index | PRIMARY       | PRIMARY | 5       | NULL |   1440 | Using index; Using temporary; Using filesort   |
|  1 | SIMPLE      | cdrs  | ALL   | starttime     | NULL    | NULL    | NULL | 260727 | Range checked for each record (index map: 0x4) |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+

Notice the figures in the rows column, and compare to the EXPLAIN of your original query. You can estimate the total number of rows examined by multiplying these together (but that gets more complicated if your query is anything other than SIMPLE).

Upvotes: 2

Related Questions