Query execution time on MySQL 8

Question

I recently upgraded to MyQSL 8 from 5.7 (by spinning up a snapshot of my production RDS instance with the updated engine, if that makes any difference) and I'm having an issue on a ~64M rows table (raw_records), when querying for the latest record available for each weather station.

| station_id | tstamp              | temp     | rel_hum  |
|------------|---------------------|----------|----------|
| 1          | 2022-12-06 07:00:00 | 23.40000 | 34.00000 |
| 1          | 2022-12-06 08:00:00 | 23.70000 | 34.00000 |
| 2          | 2022-12-06 07:00:00 | 20.20000 | 43.00000 |
| 2          | 2022-12-06 08:00:00 | 19.80000 | 38.00000 |
   ...

The query (#1 below) used to work fine, taking few hundreds of ms to execute for up about 350 station IDs. After the database update however the query took too long so the request failed due to timeout. It turns out that the running time is now somehow proportional to the number of station IDs passed: with just 10 IDs the original query takes now ~1.5sec, with 100 stations about 20sec and so on. After some digging and experimenting it turns out the raw query below (#2) works just fine, executing in about 150-200ms for any number of stations.

Below you can see the execution time compared when running the two queries one after the other (using Laravel v9.x):


$station_ids = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ];

DB::enableQueryLog();

// original query (#1)
RawRecord::selectRaw('`station_id`, MAX(`tstamp`) AS `latest_record`')
    ->whereIn('station_id', $station_ids->toArray())
    ->groupBy('station_id')
    ->get();

// new query (#2)
DB::select("
    SELECT `station_id`, MAX(`tstamp`) AS `latest_record`
    FROM `raw_records`
    WHERE `station_id` IN ( {$station_ids->join(',')} )
    GROUP BY `station_id`
");

dd(DB::getQueryLog());

[
  [
    "query" => "select `station_id`, MAX(`tstamp`) AS `latest_record` from `raw_records` where `station_id` in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) group by `station_id`"
    "bindings" => [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]
    "time" => 1524.56
  ],
  [
    "query" => """
          SELECT `station_id`, MAX(`tstamp`) AS `latest_record`
          FROM `raw_records`
          WHERE `station_id` IN ( 1,2,3,4,5,6,7,8,9,10 )
          GROUP BY `station_id`
      """
    "bindings" => []
    "time" => 148.63
  ]
]

Now, my question is why is this happening, what could be the issue here? It appears the resulting executed SQL command is the same exact one, the only difference is the first one is created via QueryBuilder.

Edit

This is the output of explain for both queries; I'm not familiar with this statement and how to interpret the result, however I would say there is some issue related to the PK and indexes used to perform the query. Just to clarify I do define the PK in the model class:

// RawRecord
protected $primaryKey = ['station_id', 'tstamp'];

RawRecord::selectRaw('`station_id`, MAX(`tstamp`) AS `latest_record`')
    ->whereIn('station_id', $station_ids->toArray())
    ->groupBy('station_id')
    ->explain();

[
  "id" => 1
  "select_type" => "SIMPLE"
  "table" => "raw_records"
  "partitions" => null
  "type" => "range"
  "possible_keys" => "PRIMARY,station_id,tstamp"
  "key" => "PRIMARY"
  "key_len" => "4"
  "ref" => null
  "rows" => 123280
  "filtered" => 100.0
  "Extra" => "Using where; Using index"
]

DB::select("
    EXPLAIN SELECT `station_id`, MAX(`tstamp`) AS `latest_record`
    FROM `raw_records`
    WHERE `station_id` IN ( {$station_ids->join(',')} )
    GROUP BY `station_id`
");

[
  "id" => 1
  "select_type" => "SIMPLE"
  "table" => "raw_records"
  "partitions" => null
  "type" => "range"
  "possible_keys" => "PRIMARY,station_id,tstamp"
  "key" => "station_id"
  "key_len" => "4"
  "ref" => null
  "rows" => 368
  "filtered" => 100.0
  "Extra" => "Using where; Using index for group-by"
]

Query execution time on MySQL 8

Answers (1)

Related Questions