teloon
teloon

Reputation: 238

weird behavior: less rows are scanned without index on the columns in where condition

I have a table called A which store the longitude and latitude data. And there's a query to get rows whose location is within some geo rectangle, e.g.:

SELECT *
FROM A
WHERE A.longitude > -121.276052
    AND A.longitude < -110.159143
    AND A.latitude > 32.802275
    AND A.latitude < 38.335916
ORDER BY A.id
LIMIT 100

Table A only have index on id column, which is also the primary key. I would expect a big amount of rows be scanned because of no index on longitude and latitude columns. However, it turns out it only scans 100 rows from the output of explain.

(Also I verified the ids of output rows are not accidentally the smallest. They're actually in the middle of id range)

Why come mysql can accurately identifies the 100 rows with id index only?

Upvotes: 0

Views: 45

Answers (2)

Rick James
Rick James

Reputation: 142433

The short answer: EXPLAIN's "Rows" cannot be trusted.

The long answer:

The following can be trusted:

FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';

If you see 100 reads (or maybe 99 or 101), then it is nice and efficient. But I suspect you will see a larger number, but less than the number of rows in the table. However, if there are fewer than 100 rows in that bounding box, it will have to scan the entire table.

Partial speedup

INDEX(latitude)

will help some. That is generally "good enough" for medium-sized lat-lng tables. This should be an "order of magnitude" faster than what you have.

A "late fetch" may help a little bit more:

SELECT *
    FROM A
    JOIN ( SELECT id FROM A WHERE lat... AND lng... ) AS x USING(id)
-- and have this "covering" index for the subquery:
INDEX(latitude, longitude, id)

Further speedup

If you have a large table, then the solution becomes more complex; I discuss how to use partitioning to get a 2D index. It includes stored routines that are another order of magnitude faster.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270583

I'm not 100% sure what the question is. In your query, MySQL is scanning the index for the order by -- it is getting all rows in id order. Then, it is looking up each row in the data pages to see if it matches the where clause.

Presumably, the first 100 rows in the index match the where clause. Hence, MySQL can stop scanning after 100 rows.

Upvotes: 2

Related Questions