Reputation: 722
I have a database used for simple reverse geocoding. The database rely on a table containing latitude, longitude and place name. Everytime a couple latitude,longitude is not present or, better, everytime the searched latitude,longitude differs too much from an existing latitude, longitude, I add a new row using GoogleMaps reverse geocoding service. Below the code to generate the address table:
CREATE TABLE `data_addresses` (
`ID` int(11) NOT NULL COMMENT 'Primary Key',
`LAT` int(11) NOT NULL COMMENT 'Latitude x 10000',
`LNG` int(11) NOT NULL COMMENT 'Longitude x 10000',
`ADDRESS` varchar(128) NOT NULL COMMENT 'Reverse Geocoded Street Address'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `data_addresses`
ADD PRIMARY KEY (`ID`),
ADD UNIQUE KEY `IDX_ADDRESS_UNIQUE_LATLNG` (`LAT`,`LNG`),
ADD KEY `IDX_ADDRESS_LAT` (`LAT`),
ADD KEY `IDX_ADDRESS_LNG` (`LNG`);
ALTER TABLE `data_addresses`
MODIFY `ID` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Primary Key';
As you can see the trick is to use place two indexes on Latitude and Longitude. As normally latitude and longitude are float we use their value multiplied by 10000, so each couple latitude/longitude is unique. This implies a resolution of about 50m that is satisfying for my needs.
Now the problem: everytime I need to know if a given latitude/longitude (MyLat,MyLon) is already present or not I execute the following query:
SELECT `id`, ROUND(SQRT(POW(ABS(`LAT`-ROUND(MyLat*10000)),2)+POW(ABS(`LNG`-ROUND(MyLon*10000)),2))) AS R FROM splc_smarttrk.`data_addresses` ORDER BY R ASC LIMIT 1
This query will return to me the closest point and will give me also R (the rating): smaller R means closest approximation, so let say that everytime I find an R that is above 10 I need to add a new row to address table. Address table at present contains about 615k rows.
The problem is that despite indexes that I have placed this query is too slow (takes about 2 seconds on a 2x Xeon server). Below the results of Explain:
Upvotes: 0
Views: 114
Reputation: 722
Following the suggestion of Raymond Nijland I modified the query as follows:
SELECT `id` AS ID,
ROUND(SQRT(POW(ABS(`LAT`-ROUND(NLat*10000)), 2) +
POW(ABS(`LNG`-ROUND(NLon*10000)), 2))
) AS RT INTO ADDR_ID, RATING
FROM splc_smarttrk.`data_addresses`
WHERE (`LAT` BETWEEN (ROUND(NLat*10000)-R) AND (ROUND(NLat*10000)+R))
AND (`LNG` BETWEEN (ROUND(NLon*10000)-R) AND (ROUND(NLon*10000)+R))
ORDER BY RT ASC
LIMIT 1;
this trick reduces the dataset to 10 records in the worst case scenario, hence the speed is fair good despite the ORDER BY clause. In fact I don't really need to know the Distance from existing point, I just need to know if that distance is above a givel limit (here if is within a 10x10 rectangle that means R=5).
Upvotes: 0
Reputation: 142238
Instead of computing the distance (or in addition to), provide a "bounding box". This will be much faster.
Still faster would be the complex code here: mysql.rjweb.org/doc.php/latlng
Once you have UNIQUE KEY IDX_ADDRESS_UNIQUE_LATLNG (LAT, LNG)
, there is no need for KEY IDX_ADDRESS_LAT (LAT)
*10000 can fit in MEDIUMINT
. And it is good to about 16 meters or 52 feet.
Upvotes: 1
Reputation: 11602
Can't you optimize this by retriving a fixed dataset of nearby latitude(s) and longitude(s) and calculate the Rating (R) and pick the smallest Rating on this fixed dataset.
p.s not tested might contain errors in the sorting. but it might help you on your way.
SELECT
id
, ROUND(SQRT(POW(ABS(`LAT`-ROUND([LAT]*10000)),2)+POW(ABS(`LNG`- ROUND([LNG]*10000)),2))) AS R
FROM (
SELECT
LAT
FROM
data_addresses
WHERE
LAT <= [LAT]
ORDER BY
LAT DESC
LIMIT 100
UNION ALL
SELECT
LAT
FROM
data_addresses
WHERE
LAT >= [LAT]
ORDER BY
LAT ASC
LIMIT 100
SELECT
LNG
FROM
data_addresses
WHERE
LNG <= [LNG]
ORDER BY
LNG DESC
LIMIT 100
UNION ALL
SELECT
LNG
FROM
data_addresses
WHERE
LNG >= [LNG]
ORDER BY
LNG ASC
LIMIT 100
)
AS data_addresses_range
ORDER BY
R ASC
LIMIT 1
Upvotes: 2