manesioz
manesioz

Reputation: 837

How does BigQuery perform spatial joins?

From the official docs we read that LEFT/RIGHT/FULL OUTER JOINS are not optimized for spatial data. I have been running several long queries that use complex joins on GEOGRAPHY data types.

My question is, how does BigQuery deal with spatial data join under the hood? Is everything converted to Geohash?

I have tried clustering my table by a GEOGRAPHY type column but so far speed improvements have been negligible.

If I use Geohash (STRING) in a where clause for a JOIN instead of a GEOGRAPHY type does that result in a performance boost?

Here's an example of what I'm talking about:

select t1.Geohash, t1.Name, t1.Way, t1.Long, t1.Lat, t1.CoreInt
       , t1.Label, t1.IntLat, t1.IntLong
       , row_number() over(partition by Geohash order by Dist) as RowNum
       , Distance
  from table_name t1
  left outer join (select Geohash, Label from table where CoreInt = 1) t2 
 using (Geohash)
 where t2.Label is null
    or t1.Label = t2.Label

Thanks

Upvotes: 2

Views: 1000

Answers (1)

Michael Entin
Michael Entin

Reputation: 7744

Yes, BigQuery does not optimize LEFT/RIGHT/OUTER spatial JOINs yet.

For now you would need to convert such joins to INNER JOIN + select mismatched rows, see this question for example: How to JOIN in geography columns using ST_CONTAINS in Big query

Internally BigQuery uses S2 indexing. It might be faster or slower than join on geohash, depending on data. But unlike join on geohash it guarantees correct result.

Joining on Geohash has two major issues:

1) Geohash buckets are not uniform, geohashes of same fixed length describe much bigger real areas near equator than near poles. S2 provides more uniform indexing.

2) Geohash also may miss some pairs that should be joined, when two geographies are close enough but are just across geohash split boundary and thus hash to different values. For example, points just below and just above 45 parallel would have different geohash values, even if they are very close, and one expects them to join.

Upvotes: 8

Related Questions