Why would MIN() query be slower than ORDER BY X LIMIT 1?

Question

To start of, I have seen: Why is MAX() 100 times slower than ORDER BY ... LIMIT 1?

It looks like the same question, but the issue there is lack of indices. So let me clarify my case.

To generalize, I will simplify my two queries:

-- min:
SELECT min(id) FROM my_table WHERE s_time >= now() - INTERVAL 14 DAY;
-- exec time: ~0.260 s

-- order-limit:
SELECT id FROM my_table WHERE s_time >= now() - INTERVAL 14 DAY ORDER BY s_time, id LIMIT 1;
-- exec time: ~0.060 s

Here, id is the primary key, and s_time is an indexed timestamp.

Running explain format=json, shows that the difference between these two queries is that the order-limit version has an ordering_operation that says using_filesort: false. Both show same query_cost analysis.

Now, my understanding of this is that if the column is indexed, then it's ordered in a btree. And, that these indexed entries have information of pertaining primary key. Finding the first one (limit 1) should be a simple traversal of the btree, and quite quick.

However, performing MIN(primary_key) FROM foo WHERE indexed_entry > bar, should be handled in the same way. Is this simply a case of poor optimization by innoDb?

If using LIMIT has a special optimization case where analyzes memory requirements for the number of entries, and if possible uses a priority queue instead of quicksort, shouldn't MIN() be part of that same use case where it uses LIMIT 1?

explain differences:

min-case:

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "91987.68"
    },
    "table": {
      "table_name": "my_table",
      "access_type": "range",
      "possible_keys": [
        "s_time"
      ],
      "key": "s_time",
      "used_key_parts": [
        "s_time"
      ],
      "key_length": "4",
      "rows_examined_per_scan": 229128,
      "rows_produced_per_join": 229128,
      "filtered": "100.00",
      "using_index": true,
      "cost_info": {
        "read_cost": "46162.08",
        "eval_cost": "45825.60",
        "prefix_cost": "91987.68",
        "data_read_per_join": "104M"
      },
      "used_columns": [
        "id",
        "s_time"
      ],
      "attached_condition": "(`db`.`my_table`.`s_time` >= ((now() - interval 14 day)))"
    }
  }
}

order-limit

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "92215.71"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "my_table",
        "access_type": "range",
        "possible_keys": [
          "s_time"
        ],
        "key": "s_time",
        "used_key_parts": [
          "s_time"
        ],
        "key_length": "4",
        "rows_examined_per_scan": 229696,
        "rows_produced_per_join": 229696,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "46276.51",
          "eval_cost": "45939.20",
          "prefix_cost": "92215.71",
          "data_read_per_join": "105M"
        },
        "used_columns": [
          "id",
          "s_time"
        ],
        "attached_condition": "(`db`.`my_table`.`started_time` >= ((now() - interval 14 day)))"
      }
    }
  }
}

Interesting related documentation: method bool check_if_pq_applicable() in https://dev.mysql.com/doc/dev/mysql-server/8.0.0/filesort_8cc.html

DESCRIPTION Given a query like this: SELECT ... FROM t ORDER BY a1,...,an LIMIT max_rows; This function tests whether a priority queue should be used to keep the result. Necessary conditions are:

estimate that it is actually cheaper than merge-sort enough memory to store the records.

Rick James · Accepted Answer

They do different things, hence one has to work harder.

SELECT  min(id)
    FROM  my_table
    WHERE  s_time >= now() - INTERVAL 14 DAY;

Searches all of the items in the last two weeks to find the lowest id. INDEX(s_time, id) will help some.

SELECT  id
    FROM  my_table
    WHERE  s_time >= now() - INTERVAL 14 DAY
    ORDER BY  s_time, id
    LIMIT  1;

If you have INDEX(stime, id), then it will look at only one row -- the first one of 14 days ago. No scan. No checking to see if it is the smallestid`.

Note: If you have PRIMARY KEY(id), INDEX(stime), then that index is effectively (stime, id).

Since you probably inserted the rows in stime order, the results will probably be the same. But the Optimizer has no way of knowing that.

Why would MIN() query be slower than ORDER BY X LIMIT 1?

Answers (1)

Related Questions