marked-down
marked-down

Reputation: 10408

Optimization techniques for select query on single table with ~2.25M rows?

I have a MySQL table running on the InnoDB engine called squares that has roughly 2,250,000 rows with the following table structure:

`squares` (
   `square_id` int(7) unsigned NOT NULL,
   `ref_coord_lat` double(8,6) NOT NULL,
   `ref_coord_long` double(9,6) NOT NULL,
   PRIMARY KEY (`square_id`),
   KEY `ref_coord_lat` (`ref_coord_lat`),
   KEY `ref_coord_long` (`ref_coord_long`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The first column square_id holds a simple incrementing value from 0 - 2.25M, while ref_coord_lat & ref_coord_long hold a set of latitude and longitude coordinates in decimal degrees for a point, respectively.

This is a read-only table. No additional rows will be added, and the only query which needs to be run against it is the following:

SELECT * FROM `squares` WHERE 
  `ref_coord_lat` BETWEEN :southLat AND :northLat AND 
  `ref_coord_long` BETWEEN :westLong AND :eastLong

...where the values following the colons are PHP PDO placeholders. Essentially, the goal of this query is to fetch all coordinate points in the table that are currently in the viewport of a Google Maps window which is bounded by the 4 coordinates in the query.

I've limited the zoom level where this query is run with the Google Maps API, so that the maximum amount of rows that can be fetched is ~5600. As the zoom level increases, the resultant fetch total decreases significantly.

Running such an example query directly in PHPMyAdmin takes 1.40-1.45 seconds. This is far too long. I'm already running standard indices on ref_coord_lat and ref_coord_long which brought the query time down from ~5 seconds, but this is still much too large for a map where an end user expects a timely response.

My question is simply: How can I further optimize this table/query to increase the speed at which results are fetched?

Upvotes: 6

Views: 1581

Answers (4)

nhahtdh
nhahtdh

Reputation: 56809

The content here is originally written by OP (Antilogical) as an edit to the question. I moved the answer portion from the question to here and made this a community wiki. @Antilogical, please ping me with comment if you want to post your own answer to gain reputation.

Well, I fixed it. Here's how:

I managed to reduce my query time from an initial 5s down to 0.6-0.7ms. I stumbled upon this question, 'How to further optimize this MySQL table for a single Query' here. Which lead me to switch my table from InnoDB to MyISAM and use geospatial abstractions to represent my coordinate point.

Firstly, I switched from InnoDB to MyISAM which is better designed for MySQL spatial extensions.

ALTER TABLE `squares` ENGINE=MyISAM;

Then, I created a geospatial column called coordinate which held a point object (which was simply just a concatenation of ref_coord_lat and ref_coord_long:

UPDATE `squares` SET `coordinate` = GeomFromText(CONCAT('POINT(', `ref_coord_lat`,' ', `ref_coord_long`, ')'));

I the added a spatial index to coordinate - which dramatically increased the query performance. Initially while not using geospatial extensions, my query to select fields from my database was:

SELECT * FROM `squares` WHERE `ref_coord_lat` BETWEEN *somecoordinate* AND *somecoordinate* AND `ref_coord_long` BETWEEN *somecoordinate* and *somecoordinate*

What this query essentially does is imitates a bounding box by putting two limits/conditions of each axis (latitude & longitude). Note that * currently means all three fields of my database, I hadn't created coordinate yet. When I switched to using MySQL spatial extensions, I could now check against my new coordinate column by using a Minimum Bounding Rectangle with the function MBRContains() which is part of the MySQL geospatial extensions set.

SELECT * FROM `squares` WHERE MBRContains(GeomFromText(POLYGON((*my four bounds now go here in lat,lon pairs*))), `coordinate`);

Notice how I'm still selecting all fields with *? You don't need to do that. The coordinate column is merely acting as a index to look values up against, so I now select everything but that column via this query below, which gives a significant speed increase over the one directly above.

SELECT `square_id`, `ref_coord_lat`, `ref_coord_long` FROM `squares` WHERE MBRContains(GeomFromText(POLYGON((*my four bounds now go here in lat,lon pairs*))), `coordinate`);

There's been a multiple order of magnitude increase in speed:

  • ~5s - Time to query initial squares table (InnoDB, no indices)
  • 1.40-1.45s - Addition of two indices for ref_coord_lat & ref_coord_long
  • 0.9s (900ms) - I then discovered that my constraint coordinates in my SELECT query had more than a dozen decimal places. I rounded them to 6 in my Javascript code (the same number of decimal places that my table stores coordinates in). This provided a decent speed increase.
  • 0.5s (500ms) - As per @N.B.'s comment below, I increased the size of inno_db_buffer_pool from 16M to 256M.
  • 45-50ms - Switched to MyISAM engine, added a coordinate point column and added a spatial index
  • 0.6-0.7ms - I altered my query so that instead of selecting * columns, it selects all but my new coordinate column.

Database optimization? Done.

Upvotes: 3

mvp
mvp

Reputation: 116078

Creating compound index on (lat, long) should help a lot.

However, right solution is to take a look at MySQL spatial extensions. Spatial support was specifically created to deal with two-dimensional data and queries against such data. If you create appropriate spatial indexes, your typical query performance should easily exceed performance of compound index on (lat, long).

Upvotes: 3

Thibault
Thibault

Reputation: 1596

Your structure seems quite OK. 2,25M rows in not that much. Your rows are small, and the comparison you do are only on double values. It should be faster.

Try to run ANALYZE, OPTIMIZE, CHECK, REPAIR commands on your table to make sure your indexes are correctly constructed.

Once this is done, you should try investigate deeper in the system. What is slowing down the query ? It can be :

Use monitoring to have data about your sql cache, memory usage etc. It will help you diagnose the issue.

Good luck with your project.

Upvotes: 3

Audrius Meškauskas
Audrius Meškauskas

Reputation: 21728

While not very elegant, splitting into multiple tables (say one per 30 degrees of latitude) usually helps. Your query is such that it is obvious which tables contain the required points.

Also, use EXPLAIN to investigate the problem.

Upvotes: 0

Related Questions