Danny Tsang
Danny Tsang

Reputation: 55

MySQL Query Optimization with MAX()

I have 3 tables with the following schema:

CREATE TABLE  `devices` (
  `device_id` int(11) NOT NULL auto_increment,
  `name` varchar(20) default NULL,
  `appliance_id` int(11) default '0',
  `sensor_type` int(11) default '0',
  `display_name` VARCHAR(100),
  PRIMARY KEY  USING BTREE (`device_id`)
) 

CREATE TABLE  `channels` (
  `channel_id` int(11) NOT NULL AUTO_INCREMENT,
  `device_id` int(11) NOT NULL,
  `channel` varchar(10) NOT NULL,
  PRIMARY KEY (`channel_id`),
  KEY `device_id_idx` (`device_id`)
) 

CREATE TABLE  `historical_data` (
  `date_time` datetime NOT NULL,
  `channel_id` int(11) NOT NULL,
  `data` float DEFAULT NULL,
  `unit` varchar(10) DEFAULT NULL,
  KEY `devices_datetime_idx` (`date_time`) USING BTREE,
  KEY `channel_id_idx` (`channel_id`)
)

The setup is that a device can have one or more channels and each channel has many (historical) data.

I use the following query to get the last historical data for one device and all it's related channels:

SELECT c.channel_id, c.channel, max(h.date_time), h.data 
FROM devices d 
INNER JOIN channels c ON c.device_id = d.device_id 
INNER JOIN historical_data h ON h.channel_id = c.channel_id 
WHERE d.name = 'livingroom' AND d.appliance_id = '0'
AND d.sensor_type = 1 AND ( c.channel = 'ch1') 
GROUP BY c.channel
ORDER BY h.date_time, channel

The query plan looks as follows:

+----+-------------+-------+--------+-----------------------+----------------+---------+---------------------------+--------+-------------+
| id | select_type | table | type   | possible_keys         | key            | key_len | ref                       | rows   | Extra       |
+----+-------------+-------+--------+-----------------------+----------------+---------+---------------------------+--------+-------------+
|  1 | SIMPLE      | c     | ALL    | PRIMARY,device_id_idx | NULL           | NULL    | NULL                      |     34 | Using where |
|  1 | SIMPLE      | d     | eq_ref | PRIMARY               | PRIMARY        | 4       | c.device_id               |      1 | Using where |
|  1 | SIMPLE      | h     | ref    | channel_id_idx        | channel_id_idx | 4       | c.channel_id              | 322019 |             |
+----+-------------+-------+--------+-----------------------+----------------+---------+---------------------------+--------+-------------+
3 rows in set (0.00 sec)

The above query is currently taking approximately 15 secs and I wanted to know if there are any tips or way to improve the query?

Edit: Example data from historical_data

+---------------------+------------+------+------+
| date_time           | channel_id | data | unit |
+---------------------+------------+------+------+
| 2011-11-20 21:30:57 |         34 | 23.5 | C    |
| 2011-11-20 21:30:57 |          9 |   68 | W    |
| 2011-11-20 21:30:54 |         34 | 23.5 | C    |
| 2011-11-20 21:30:54 |          5 |  316 | W    |
| 2011-11-20 21:30:53 |         34 | 23.5 | C    |
| 2011-11-20 21:30:53 |          2 |   34 | W    |
| 2011-11-20 21:30:51 |         34 | 23.4 | C    |
| 2011-11-20 21:30:51 |          9 |   68 | W    |
| 2011-11-20 21:30:49 |         34 | 23.4 | C    |
| 2011-11-20 21:30:49 |          4 |  193 | W    |
+---------------------+------------+------+------+
10 rows in set (0.00 sec)

Edit 2: Mutliple channel SELECT example:

SELECT c.channel_id, c.channel, max(h.date_time), h.data 
FROM devices d 
INNER JOIN channels c ON c.device_id = d.device_id 
INNER JOIN historical_data h ON h.channel_id = c.channel_id 
WHERE d.name = 'livingroom' AND d.appliance_id = '0'
AND d.sensor_type = 1 AND ( c.channel = 'ch1' OR c.channel = 'ch2' OR c.channel = 'ch2') 
GROUP BY c.channel
ORDER BY h.date_time, channel

I've used OR in the c.channel where clause because it was easier to generated pro grammatically but it can be changed to use IN if necessary.

Edit 3: Example result of what I'm trying to achieve:

+-----------+------------+---------+---------------------+-------+
| device_id | channel_id | channel | max(h.date_time)    | data  |
+-----------+------------+---------+---------------------+-------+
|        28 |          9 | ch1     | 2011-11-21 20:39:36 |     0 |
|        28 |         35 | ch2     | 2011-11-21 20:30:55 | 32767 |
+-----------+------------+---------+---------------------+-------+

I have added the device_id to the example but my select will only need to return channel_id, channel, last date_time i.e max and the data. The results should be the last record from the historical_data table for each channel for one device.

Upvotes: 1

Views: 174

Answers (3)

Danny Tsang
Danny Tsang

Reputation: 55

It seems that removing an re-creating the index on date_time by deleting and creating it again sped up my original SQL up to around 2secs

Upvotes: 1

DRapp
DRapp

Reputation: 48129

I would first add an index on the devices table ( appliance_id, sensor_type, name ) to match your query. I don't know how many entries are in this table, but if large, and many elements per device, get right to it.

Second, on your channels table, index on ( device_id, channel )

Third, on your history data, index on ( channel_id, date_time )

then,

SELECT STRAIGHT_JOIN
      PreQuery.MostRecent,
      PreQuery.Channel_ID,
      PreQuery.Channel,
      H2.Data,
      H2.Unit
   from 
      ( select 
              c.channel_id,
              c.channel, 
              max( h.date_time ) as MostRecent
           from 
              devices d

                 join channels c
                    on d.device_id = c.device_id
                    and c.channel in ( 'ch1', 'ch2', 'ch3' )

                    join historical_data h
                      on c.channel_id = c.Channel_id
           where
                  d.appliance_id = 0
              and d.sensor_type = 1
              and d.name = 'livingroom'

           group by
              c.channel_id ) PreQuery 

      JOIN Historical_Data H2
         on PreQuery.Channel_ID = H2.Channel_ID
        AND PreQuery.MostRecent = H2.Date_Time
   order by 
      PreQuery.MostRecent,
      PreQuery.Channel

Upvotes: 0

Nonym
Nonym

Reputation: 6299

I haven't been able to test this, so I'd like to ask you to run it and let us know what happens.. if it gives you the desired result and if it runs faster than your current:

CREATE DEFINER=`root`@`localhost` PROCEDURE `GetLatestHistoricalData_EXAMPLE`
  (
      IN param_device_name VARCHAR(20)
    , IN param_appliance_id INT
    , IN param_sensor_type INT
    , IN param_channel VARCHAR(10)
  )
BEGIN

    SELECT 
        h.date_time, h.data 
    FROM 
        historical_data h
        INNER JOIN
        (
            SELECT c.channel_id
            FROM devices d 
            INNER JOIN channels c ON c.device_id = d.device_id 
            WHERE 
                d.name = param_device_name
            AND d.appliance_id = param_appliance_id
            AND d.sensor_type = param_sensor_type
            AND c.channel = param_channel
        ) 
        c ON h.channel_id = c.channel_id 
    ORDER BY h.date_time DESC
    LIMIT 1;

END

Then to run a test:

CALL GetLatestHistoricalData_EXAMPLE ('livingroom', 0, 1, 'ch1');

I tried working it into a stored procedure so that even if you get the desired results using this for one device, you can try it with another device and see the results... Thanks!

[edit] : : In response to Danny's comment here's an updated test version:

CREATE DEFINER=`root`@`localhost` PROCEDURE `GetLatestHistoricalData_EXAMPLE_3Channel`
  (
      IN param_device_name VARCHAR(20)
    , IN param_appliance_id INT
    , IN param_sensor_type INT
    , IN param_channel_1 VARCHAR(10)
    , IN param_channel_2 VARCHAR(10)
    , IN param_channel_3 VARCHAR(10)
  )
BEGIN

    SELECT 
        h.date_time, h.data 
    FROM 
        historical_data h
        INNER JOIN
        (
            SELECT c.channel_id
            FROM devices d 
            INNER JOIN channels c ON c.device_id = d.device_id 
            WHERE 
                d.name = param_device_name
            AND d.appliance_id = param_appliance_id
            AND d.sensor_type = param_sensor_type
            AND (
                c.channel IN (param_channel_1
                             ,param_channel_2
                             ,param_channel_3
                ) 
        c ON h.channel_id = c.channel_id 
    ORDER BY h.date_time DESC
    LIMIT 1;

END

Then to run a test:

CALL GetLatestHistoricalData_EXAMPLE_3Channel ('livingroom', 0, 1, 'ch1', 'ch2' , 'ch3');

Again, this is just for testing, so you'll be able to see if it meets your needs..

Upvotes: 0

Related Questions