Reputation: 2318
I have a table with some hundred million records, with about 15 different columns. The data is among others speed recordings, with a timestamp, an id on the vehicle that is tracked and a speed recording. A single vehicle can have hundreds of thousands speed recordings, and there is about 40 000 different vehicles.
I want to find the maximum speed recording for each single vehicle.
Just querying:
SELECT userid, max(speed) from SpeedReadings group by userid
Takes a lot of time, so I want to optimize this. My first notion is to make a index with userid and speed. Does the order of the variables in the query, and the index table matter?
Is there speed difference between those two queries:
SELECT userid, max(speed) from SpeedReadings group by userid
SELECT distinct userid, max(speed) from SpeedReadings
Upvotes: 0
Views: 422
Reputation: 180010
The best way to optimize the grouping is to have the data stored in a list that is sorted by the user ID, so that the database does not need to keep all possible results around.
In other words, you need an index where userid
is the first column.
To speed up the computation further, append the speed
column to the index to get a covering index.
The order of columns in the query does not matter.
There is a speed difference with the query SELECT distinct userid, max(speed) from SpeedReadings
, but I guess you want to get the correct result, so the difference does not matter.
Upvotes: 1