arimbun
arimbun

Reputation: 3915

PostgreSQL - optimising joins on latitudes and longitudes comparing distances

I have two tables, say A and B that contain city information with two columns: latitude and longitude. A contains 100,000 records and B contains 1,000,000 records. My objective is to find the rows of B that are within 1 kilometre from A (for each row in A). How do I go about doing this efficiently? I am targeting a time of less than 30 minutes.

The following query takes forever (which I believe is the result of the cross-product of 100,000 * 1,000,000 = 100 billion row comparisons!):

select *
from A
inner join B
on is_nearby(A.latitude, A.longitude, B.latitude, B.longitude)

is_nearby() is just a simple function that finds the difference between the latitudes and longitudes.

I did a test for one row of A, it takes about 5 seconds per row. By my calculation, it is going to take several weeks for the query to finish execution, which is not acceptable.

Upvotes: 3

Views: 783

Answers (1)

Mike T
Mike T

Reputation: 43642

Yes, PostGIS will make things faster, since it (a) knows how to convert degrees of latitude and longitude to kilometres (I'll use the geography type below), and (b) supports a GiST index, which is optimal for GIS.

Assuming you have PostGIS version 2 available on your system, upgrade your datbase and tables:

CREATE EXTENSION postgis;

-- Add a geog column to each of your tables, starting with table A
ALTER TABLE A ADD COLUMN geog geography(Point,4326);
UPDATE A SET geog = ST_MakePoint(longitude, latitude);
CREATE INDEX ON A USING GIST (geog);
--- ... repeat for B, C, etc.

Now to find the rows of B that are within 1 kilometre from A (for each row in A):

SELECT A.*, B.*, ST_Distance(A.geog, B.geog)/1000 AS dist_km
FROM A
JOIN B ON ST_DWithin(A.geog, B.geog, 1000);

Upvotes: 3

Related Questions