Reputation: 3915
I have two tables, say A and B that contain city information with two columns: latitude and longitude. A contains 100,000 records and B contains 1,000,000 records. My objective is to find the rows of B that are within 1 kilometre from A (for each row in A). How do I go about doing this efficiently? I am targeting a time of less than 30 minutes.
The following query takes forever (which I believe is the result of the cross-product of 100,000 * 1,000,000 = 100 billion row comparisons!):
select *
from A
inner join B
on is_nearby(A.latitude, A.longitude, B.latitude, B.longitude)
is_nearby()
is just a simple function that finds the difference between the latitudes and longitudes.
I did a test for one row of A, it takes about 5 seconds per row. By my calculation, it is going to take several weeks for the query to finish execution, which is not acceptable.
Upvotes: 3
Views: 783
Reputation: 43642
Yes, PostGIS will make things faster, since it (a) knows how to convert degrees of latitude and longitude to kilometres (I'll use the geography type below), and (b) supports a GiST index, which is optimal for GIS.
Assuming you have PostGIS version 2 available on your system, upgrade your datbase and tables:
CREATE EXTENSION postgis;
-- Add a geog column to each of your tables, starting with table A
ALTER TABLE A ADD COLUMN geog geography(Point,4326);
UPDATE A SET geog = ST_MakePoint(longitude, latitude);
CREATE INDEX ON A USING GIST (geog);
--- ... repeat for B, C, etc.
Now to find the rows of B that are within 1 kilometre from A (for each row in A):
SELECT A.*, B.*, ST_Distance(A.geog, B.geog)/1000 AS dist_km
FROM A
JOIN B ON ST_DWithin(A.geog, B.geog, 1000);
Upvotes: 3