Noel Whitemore
Noel Whitemore

Reputation: 794

How do I optimise this DB query to calculate daily totals for SMS messages received

To briefly explain my problem, an SMS gateway connects to my PHP script using a GET request and the message contents are extracted and saved to a MySQL database. The message content includes a timestamp value and this value is saved within the same table. There is only one table used and the columns are ID (int), sender (int), message (varchar), timestamp (timestamp).

The SMS gateway receives messages from approximately 100 different numbers and the maximum number of messages that will be received from each number is 400 each day, although they do not send messages every day. Over the course of the month, the script will probably be saving up to 300,000 messages.

I had to create a browser-based report that generates a table containing a row for each number and then the daily totals for the number of messages received from that number, so the first column in the table shows the number, the second column shows the total number of messages received on the 1st August, the third column shows the total number of messages received on the 2nd August, etc.

The script itself works fine but times-out because of the number of totals that need to be retrieved. I've implemented the queries in the following two ways but neither completes within the maximum script execution time of 60 seconds:

  1. Retrieve a distinct list of the numbers in the database, then loop through the list of numbers and run up to 31 queries for each number to retrieve the daily totals using a COUNT() statement.

  2. Retrieve a distinct list of the numbers in the database, then loop through the list of numbers and run a single query to retrieve the timestamp values for every message received within the month. Once the list of timestamp values has been retrieved, the results are looped through and the date part of the timestamp extracted and an array updated - the array key is the day of the month and the value is the total of messages received in that month. When the timestamp value matches the day the array value is incremented by one.

Neither of these solutions seems ideal because the first solution has to run too many separate queries and the second solution retrieves a significant amount of data which also has to be processed.

Approaches that I'm considering at this stage are to either look at adding some kind of GROUP BY clause or extracting just the date part of the timestamp, saving it to a separate column and then indexing that in addition to the number column which is already indexed.

Thank you for any advice or assistance you can offer. My options as far as optimising the database are limited because I can't change the database type and I can't edit the MySQL configuration file, so I have to make sure my queries are designed correctly and the database schema is optimal.

Upvotes: 0

Views: 135

Answers (1)

spencer7593
spencer7593

Reputation: 108480

In general, it's more efficient to retrieve the results you need with a single query.

In terms of raw MySQL performance, you'd likely be better off with a query like this:

SELECT t.number
     , DATE(t.message_datetime) AS dt
     , COUNT(1)                 AS cnt
  FROM sms_messages t
 WHERE t.message_datetime >= '2014-07-01'
   AND t.message_datetime <  '2014-07-01' + INTERVAL 1 MONTH
 GROUP
    BY t.number
     , DATE(t.message_datetime)

But, what this doesn't give you is the counts in separate columns, and it doesn't give you any rows that have a count of zero. (If a given number doesn't have any messages in the month, then you wouldn't get any rows with that number. And if a given number didn't have any messages on a given date, you wouldn't get a row for that either.)

The query could be modified to return zero counts for missing "date" values for a given number, but that would require more processing on the MySQL server, to generate the appropriate rows. The query could also be modified to return all numbers, but again, that's a more complicated query and more processing.

The query could also be modified to return the count for each day in a separate column, but again, at the cost of more complex SQL and more processing.

So, if you can deal with "missing" rows, then this query is likely the most efficient.


For the best possible raw performance of MySQL, If you had just the "date" portion of the datetime stored as a separate column, then this query:

SELECT t.number
     , t.dt
     , COUNT(1) AS cnt
  FROM sms_messages t
 WHERE t.message_datetime >= '2014-07-01'
   AND t.message_datetime <  '2014-07-01' + INTERVAL 1 MONTH
 GROUP
    BY t.number
     , t.dt

could make use of an appropriate index to optimize the GROUP BY, to avoid the "Using filesort" operation that would otherwise be required:

... ON sms_messages (number, dt)

Upvotes: 1

Related Questions