Sander Klijsen
Sander Klijsen

Reputation: 33

BigQuery Left Join based on st_dwithin condition acting like an Inner Join

I need help with the following...

I've created a query that should join the records from another table based on a certain distance between two coordinates. I end up with a table that only has records with matching location names (like an inner join). I need every record in the table_customer_x and locationname should be a null if the distance between any location for that customer is > 250.

The query that I created:

 SELECT t.customerid, t.geolatitude, t.geolongitude, tt.locationname
 FROM `table_customer_x` t
 LEFT JOIN  `table_location` tt  
 on ST_DWITHIN(ST_GEOGPOINT(t.geoLatitude,t.geoLongitude), ST_GEOGPOINT(tt.latitude, tt.longitude), 250)
 where tt.customer_id= 204
 and t.timestamp > "2016-01-01"
 and tt.latitude <= 90 and tt.latitude >= -90

table_customer_x looks like:

timestamp             geoLatitude       geoLongitude    
2018-01-01 00:00:00       52.000             4.000

table_location looks like:

latitude       longitude       name      customer_id
 52.010          4.010      hospital x     204

Upvotes: 2

Views: 1615

Answers (2)

Michael Entin
Michael Entin

Reputation: 7744

It could have been a BigQuery bug, seems to be fixed now.

Geospatial outer join is not yet implemented, so this query should fail with message LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.

The workaround is to simulate outer join using inner join: do inner join, then union with unmatched rows on the left side. It requires some unique key on the outer side to work properly, I'm not sure if you have one in table_customer_x.

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173086

[Why] BigQuery Left Join based on st_dwithin condition acting like an Inner Join

In BigQuery, Spatial JOINs are implemented for INNER JOIN and CROSS JOIN operators with the following standard SQL predicate functions:

ST_DWithin
ST_Intersects
ST_Contains
ST_Within
ST_Covers
ST_CoveredBy
ST_Equals
ST_Touches   

So, you cannot expect LEFT JOIN to work properly in your case - instead - your left JOIN is "converted" into CROSS JOIN with filter in ON clause moved into Where clause
So result you see is as expected

Summary - you just need to rewrite your query :o)

You can try something like below to workaround (not tested - just possible direction for you)

#standardSQL
SELECT tt.customer_id, t.geolatitude, t.geolongitude, tt.name
FROM `project.dataset.table_customer_x` t
JOIN  `project.dataset.table_location` tt  
ON ST_DWITHIN(ST_GEOGPOINT(t.geoLatitude,t.geoLongitude), ST_GEOGPOINT(tt.latitude, tt.longitude), 250)
UNION ALL
SELECT tt.customer_id, t.geolatitude, t.geolongitude, tt.name
FROM `project.dataset.table_customer_x` t
JOIN  `project.dataset.table_location` tt  
ON NOT ST_DWITHIN(ST_GEOGPOINT(t.geoLatitude,t.geoLongitude), ST_GEOGPOINT(tt.latitude, tt.longitude), 250)
WHERE tt.customer_id= 204
AND t.timestamp > "2016-01-01"
AND tt.latitude <= 90 AND tt.latitude >= -90

Upvotes: 3

Related Questions