MySQL query optimization with complex index

I have a database used for simple reverse geocoding. The database rely on a table containing latitude, longitude and place name. Everytime a couple latitude,longitude is not present or, better, everytime the searched latitude,longitude differs too much from an existing latitude, longitude, I add a new row using GoogleMaps reverse geocoding service. Below the code to generate the address table:

CREATE TABLE `data_addresses` (
    `ID` int(11) NOT NULL COMMENT 'Primary Key',
    `LAT` int(11) NOT NULL COMMENT 'Latitude x 10000',
    `LNG` int(11) NOT NULL COMMENT 'Longitude x 10000',
    `ADDRESS` varchar(128) NOT NULL COMMENT 'Reverse Geocoded Street Address'
) ENGINE=InnoDB DEFAULT CHARSET=utf8; 
ALTER TABLE `data_addresses`
    ADD PRIMARY KEY (`ID`),
    ADD UNIQUE KEY `IDX_ADDRESS_UNIQUE_LATLNG` (`LAT`,`LNG`),
    ADD KEY `IDX_ADDRESS_LAT` (`LAT`),
    ADD KEY `IDX_ADDRESS_LNG` (`LNG`);
ALTER TABLE `data_addresses`
    MODIFY `ID` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Primary Key';

As you can see the trick is to use place two indexes on Latitude and Longitude. As normally latitude and longitude are float we use their value multiplied by 10000, so each couple latitude/longitude is unique. This implies a resolution of about 50m that is satisfying for my needs.

Now the problem: everytime I need to know if a given latitude/longitude (MyLat,MyLon) is already present or not I execute the following query:

SELECT `id`, ROUND(SQRT(POW(ABS(`LAT`-ROUND(MyLat*10000)),2)+POW(ABS(`LNG`-ROUND(MyLon*10000)),2))) AS R FROM splc_smarttrk.`data_addresses` ORDER BY R ASC LIMIT 1

This query will return to me the closest point and will give me also R (the rating): smaller R means closest approximation, so let say that everytime I find an R that is above 10 I need to add a new row to address table. Address table at present contains about 615k rows.

The problem is that despite indexes that I have placed this query is too slow (takes about 2 seconds on a 2x Xeon server). Below the results of Explain:

Upvotes: 0

Answers (3)

Power Engineering

Reputation: 722

Following the suggestion of Raymond Nijland I modified the query as follows:

SELECT  `id` AS ID,
ROUND(SQRT(POW(ABS(`LAT`-ROUND(NLat*10000)), 2) +
           POW(ABS(`LNG`-ROUND(NLon*10000)), 2))
     ) AS RT INTO  ADDR_ID, RATING
    FROM  splc_smarttrk.`data_addresses`
    WHERE  (`LAT` BETWEEN (ROUND(NLat*10000)-R) AND (ROUND(NLat*10000)+R))
      AND  (`LNG` BETWEEN (ROUND(NLon*10000)-R) AND (ROUND(NLon*10000)+R))
    ORDER BY  RT ASC
    LIMIT  1;

this trick reduces the dataset to 10 records in the worst case scenario, hence the speed is fair good despite the ORDER BY clause. In fact I don't really need to know the Distance from existing point, I just need to know if that distance is above a givel limit (here if is within a 10x10 rectangle that means R=5).

Upvotes: 0

Rick James

Reputation: 142238

Instead of computing the distance (or in addition to), provide a "bounding box". This will be much faster.

Still faster would be the complex code here: mysql.rjweb.org/doc.php/latlng

Once you have UNIQUE KEY IDX_ADDRESS_UNIQUE_LATLNG (LAT, LNG), there is no need for KEY IDX_ADDRESS_LAT (LAT)

*10000 can fit in MEDIUMINT. And it is good to about 16 meters or 52 feet.

Upvotes: 1

Raymond Nijland

Reputation: 11602

Can't you optimize this by retriving a fixed dataset of nearby latitude(s) and longitude(s) and calculate the Rating (R) and pick the smallest Rating on this fixed dataset.

p.s not tested might contain errors in the sorting. but it might help you on your way.

SELECT 
   id 
 , ROUND(SQRT(POW(ABS(`LAT`-ROUND([LAT]*10000)),2)+POW(ABS(`LNG`- ROUND([LNG]*10000)),2))) AS R

FROM ( 

  SELECT 
   LAT 
  FROM  
   data_addresses
  WHERE 
   LAT <= [LAT]  
  ORDER BY
   LAT DESC
  LIMIT 100

  UNION ALL

  SELECT 
   LAT   
  FROM 
   data_addresses
  WHERE 
   LAT >= [LAT]
  ORDER BY
   LAT ASC
  LIMIT 100

  SELECT 
   LNG 
  FROM 
   data_addresses
  WHERE 
   LNG <= [LNG]
  ORDER BY
   LNG DESC
  LIMIT 100

  UNION ALL

  SELECT 
   LNG
  FROM 
   data_addresses
  WHERE 
   LNG >= [LNG]
  ORDER BY
   LNG ASC
  LIMIT 100
) 
 AS data_addresses_range
ORDER BY 
 R ASC
LIMIT 1

Upvotes: 2

MySQL query optimization with complex index

Answers (3)

Related Questions