Poor performance to calculate distances with latitude and longitude

Question

I'm trying to get the quantity of people in 2km or less from specific locations, to do that I'm calculating distances with latitude and longitude. In one table I just have latitude, longitude and in other I have more fields but with latitude and longitude as well.

Table1 = 488792 rows
Table2 = 63003 rows

The query is valid and will process 12.3 MB when run.

The query I'm using is:

select 
e.lat,
e.long,
e.searches,
count(distinct l.id)
from dataset.table1 e
join dataset.table2 l 
     on 6371000*ACOS(COS(3.14159265359/180*(90-e.lat)) *COS(3.14159265359/180*(90-l.lat)) +SIN(3.14159265359/180*(90-e.lat)) *SIN(3.14159265359/180*(90-l.lat)) *COS(3.14159265359/180*(e.long-l.long))) <= 2000 # way to calculate distance from lats and longs
group by e.lat,
e.long,
e.searches

But the query is not running, it takes more than 15 minutes every time and I have to cancel.

What could be the problem?

Felipe Hoffa · Accepted Answer

This query, similar to your original one, takes 2 minutes:

SELECT distance, COUNT(*) FROM (
SELECT
  e.lat,
  e.long
  , 6371000*ACOS(COS(3.14159265359/180*(90-e.lat)) *COS(3.14159265359/180*(90-l.lat)) +SIN(3.14159265359/180*(90-e.lat)) *SIN(3.14159265359/180*(90-l.lat)) *COS(3.14159265359/180*(e.long-l.long))) <= 2000 distance
  , e.long-l.long longlong, e.lat-l.lat latlat
FROM
  `buoyant-history-159518.test_lat_long.table1` e
JOIN
  `buoyant-history-159518.test_lat_long.table1` l
ON 
 (COS(3.14159265359/180*(90-e.lat)) *COS(3.14159265359/180*(90-l.lat)) +SIN(3.14159265359/180*(90-e.lat)) *SIN(3.14159265359/180*(90-l.lat)) *COS(3.14159265359/180*(e.long-l.long))) <= COS(2000/6371000) + 4.5E-8
)
GROUP BY distance

To prevent floating point errors, I had to transform the JOIN inequality:

6371000*ACOS(COS(3.14159265359/180*(90-e.lat)) *COS(3.14159265359/180*(90-l.lat)) +SIN(3.14159265359/180*(90-e.lat)) *SIN(3.14159265359/180*(90-l.lat)) *COS(3.14159265359/180*(e.long-l.long))) 
<= 2000

to the similar:

(COS(3.14159265359/180*(90-e.lat)) *COS(3.14159265359/180*(90-l.lat)) +SIN(3.14159265359/180*(90-e.lat)) *SIN(3.14159265359/180*(90-l.lat)) *COS(3.14159265359/180*(e.long-l.long))) <= COS(2000/6371000) + 4.5E-8)
<= COS(2000/6371000) + 4.5E-8

Now the question is how can we get a much better performance than 2 minutes? Let's add to the JOIN some 'sanity' filters -> No 2 points in the same area can have a distance between lats and long more than 0.something:

SELECT distance, COUNT(*) FROM (
SELECT
  e.lat,
  e.long
  , (COS(3.14159265359/180*(90-e.lat)) *COS(3.14159265359/180*(90-l.lat)) +SIN(3.14159265359/180*(90-e.lat)) *SIN(3.14159265359/180*(90-l.lat)) *COS(3.14159265359/180*(e.long-l.long))) <= COS(2000/6371000) distance
  , e.long-l.long longlong, e.lat-l.lat latlat
FROM
  `buoyant-history-159518.test_lat_long.table1` e
JOIN
  `buoyant-history-159518.test_lat_long.table1` l
ON 
 NOT (e.long=l.long AND e.lat=l.lat) 
 AND ABS(e.long-l.long) < 0.021 #sanity JOIN check
 AND ABS(e.lat-l.lat) < 0.018 #sanity JOIN check
)
GROUP BY distance

With this we get very similar results, but in 12 seconds instead of 2 minutes.

I can't optimize your exact query, as your sample tables don't have the same number or rows nor columns - but trying applying those "sanity JOIN checks" before going for the full CROSS JOIN.

Poor performance to calculate distances with latitude and longitude

Answers (2)

Related Questions