Reputation: 803
I'm using the following query to populate a table
of daily temperature extremes from a large set of readings. There are 33 thermometers, and readings are taken every minute; thus, there are about 46K readings (i.e. rows) per day. But only 33 rows get added to extremes
each day.
Originally I had imagined to run this query each time a new reading was inserted, in order to keep today's extremes up to date. However, I soon discovered this query takes a LONG time to run: on my MacBook 5½ minutes on a full day's readings.
I'd be very interested in some insight into why it's so slow, and perhaps how to make this query faster, or a better alternative. Note extremes
has both Sensor_ID
and Date
as primary keys, since that's what's unique about each row.
THANKS !!
insert into extremes(Date, Sensor_ID, `min`, `max`, `avg`)
select date(DateTime) as `Date`, Sensor_ID as Sensor_ID,
min(Value) as `min`, max(Value) as `max`, avg(Value) as `avg`
from readings where date(`DateTime`) = date(NOW())
group by date(DateTime), Sensor_ID
on duplicate key update
`min` = values(`min`), `max` = values(`max`), `avg` = values(`avg`);
As requested, here are the tables
CREATE TABLE `readings` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`Sensor_ID` int(11) NOT NULL,
`DateTime` datetime NOT NULL,
`Value` double NOT NULL,
PRIMARY KEY (`ID`),
UNIQUE KEY `ID_UNIQUE` (`ID`),
KEY `ID_idx` (`Sensor_ID`),
CONSTRAINT `ID` FOREIGN KEY (`Sensor_ID`) REFERENCES `sensors` (`ID`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=54500039 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `extremes` (
`Date` datetime NOT NULL,
`Sensor_ID` int(11) NOT NULL,
`min` double DEFAULT NULL,
`max` double DEFAULT NULL,
`avg` double DEFAULT NULL,
`updates` int(11) DEFAULT '0',
PRIMARY KEY (`Date`,`Sensor_ID`),
KEY `ID_idx` (`Sensor_ID`),
CONSTRAINT `foo` FOREIGN KEY (`Sensor_ID`) REFERENCES `sensors` (`ID`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Upvotes: 0
Views: 429
Reputation: 142208
UNIQUE KEY `ID_UNIQUE` (`ID`),
slows down modifications to readings
. It is redundant since the `PRIMARY KEY is a unique key. Drop it.
Do the IODKU only on the one row you are inserting, not all the rows:
insert into extremes(Date, Sensor_ID, `min`, `max`)
VALUES(... , ..., ..., ...) -- Place constants here (from the sensor)
on duplicate key update
`min` = LEAST(`min`, values(`min`)),
`max` = GREATEST(`max`, values(`max`);
Then have a nightly job to set the avg.
That way, you are touching 1 row, not up to 1440.
Another technique is to gather the readings for a minute, then apply them in a single query.
Do you have millions of sensors? Rethink the use of a 4-byte INT
for Sensor_ID
; there are smaller integers.
Where'd you find those sensors? I doubt if you need more than the 7 significant digits of FLOAT
(4 bytes) instead of the 8-byte DOUBLEs
.
My point about the data types is -- shrinking the data will also speed things up, especially if you get to the point of having too much data to cache in RAM.
Phrasing: "both Sensor_ID and Date as primary keys" implies that there are two different PKs, which is impossible. Instead "Sensor_ID and Date form a composite primary key". And, yes, that is what you need for that table. Whether you put Date
first or last depends on what your typical SELECT
is.
FOREIGN KEYs
are another cost. Each time an insert is done, the other table needs to be checked to verify the existing of the id. By now you have debugged your code sufficiently; FKs are arguably a waste.
The avg
could be computed every minute, but (1) it is somewhat meaningless until the day is over, and (2) it would require and extra column (with the count).
Upvotes: 1
Reputation: 1798
Add an index to the DateTime
column in your readings
table.
Then try the below SQL:
insert into extremes(Date, Sensor_ID, `min`, `max`, `avg`)
select date(DateTime) as `Date`, Sensor_ID as Sensor_ID,
min(Value) as `min`, max(Value) as `max`, avg(Value) as `avg`
from readings where `DateTime` >= date_format(curdate(), '%Y-%m-%d 00:00:00')
group by date(DateTime), Sensor_ID
on duplicate key update
`min` = values(`min`), `max` = values(`max`), `avg` = values(`avg`);
Upvotes: 2