PostgreSQL - optimising joins on latitudes and longitudes comparing distances

Question

I have two tables, say A and B that contain city information with two columns: latitude and longitude. A contains 100,000 records and B contains 1,000,000 records. My objective is to find the rows of B that are within 1 kilometre from A (for each row in A). How do I go about doing this efficiently? I am targeting a time of less than 30 minutes.

The following query takes forever (which I believe is the result of the cross-product of 100,000 * 1,000,000 = 100 billion row comparisons!):

select *
from A
inner join B
on is_nearby(A.latitude, A.longitude, B.latitude, B.longitude)

is_nearby() is just a simple function that finds the difference between the latitudes and longitudes.

I did a test for one row of A, it takes about 5 seconds per row. By my calculation, it is going to take several weeks for the query to finish execution, which is not acceptable.

Mike T · Accepted Answer

Yes, PostGIS will make things faster, since it (a) knows how to convert degrees of latitude and longitude to kilometres (I'll use the geography type below), and (b) supports a GiST index, which is optimal for GIS.

Assuming you have PostGIS version 2 available on your system, upgrade your datbase and tables:

CREATE EXTENSION postgis;

-- Add a geog column to each of your tables, starting with table A
ALTER TABLE A ADD COLUMN geog geography(Point,4326);
UPDATE A SET geog = ST_MakePoint(longitude, latitude);
CREATE INDEX ON A USING GIST (geog);
--- ... repeat for B, C, etc.

Now to find the rows of B that are within 1 kilometre from A (for each row in A):

SELECT A.*, B.*, ST_Distance(A.geog, B.geog)/1000 AS dist_km
FROM A
JOIN B ON ST_DWithin(A.geog, B.geog, 1000);

PostgreSQL - optimising joins on latitudes and longitudes comparing distances

Answers (1)

Related Questions