Reputation: 33
I need help with the following...
I've created a query that should join the records from another table based on a certain distance between two coordinates. I end up with a table that only has records with matching location names (like an inner join). I need every record in the table_customer_x and locationname should be a null if the distance between any location for that customer is > 250.
The query that I created:
SELECT t.customerid, t.geolatitude, t.geolongitude, tt.locationname
FROM `table_customer_x` t
LEFT JOIN `table_location` tt
on ST_DWITHIN(ST_GEOGPOINT(t.geoLatitude,t.geoLongitude), ST_GEOGPOINT(tt.latitude, tt.longitude), 250)
where tt.customer_id= 204
and t.timestamp > "2016-01-01"
and tt.latitude <= 90 and tt.latitude >= -90
table_customer_x looks like:
timestamp geoLatitude geoLongitude
2018-01-01 00:00:00 52.000 4.000
table_location looks like:
latitude longitude name customer_id
52.010 4.010 hospital x 204
Upvotes: 2
Views: 1615
Reputation: 7744
It could have been a BigQuery bug, seems to be fixed now.
Geospatial outer join is not yet implemented, so this query should fail with message LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
The workaround is to simulate outer join using inner join: do inner join, then union with unmatched rows on the left side. It requires some unique key on the outer side to work properly, I'm not sure if you have one in table_customer_x.
Upvotes: 1
Reputation: 173086
[Why] BigQuery Left Join based on st_dwithin condition acting like an Inner Join
In BigQuery, Spatial JOINs are implemented for INNER JOIN and CROSS JOIN operators with the following standard SQL predicate functions:
ST_DWithin
ST_Intersects
ST_Contains
ST_Within
ST_Covers
ST_CoveredBy
ST_Equals
ST_Touches
So, you cannot expect LEFT JOIN to work properly in your case - instead - your left JOIN is "converted" into CROSS JOIN with filter in ON clause moved into Where clause
So result you see is as expected
Summary - you just need to rewrite your query :o)
You can try something like below to workaround (not tested - just possible direction for you)
#standardSQL
SELECT tt.customer_id, t.geolatitude, t.geolongitude, tt.name
FROM `project.dataset.table_customer_x` t
JOIN `project.dataset.table_location` tt
ON ST_DWITHIN(ST_GEOGPOINT(t.geoLatitude,t.geoLongitude), ST_GEOGPOINT(tt.latitude, tt.longitude), 250)
UNION ALL
SELECT tt.customer_id, t.geolatitude, t.geolongitude, tt.name
FROM `project.dataset.table_customer_x` t
JOIN `project.dataset.table_location` tt
ON NOT ST_DWITHIN(ST_GEOGPOINT(t.geoLatitude,t.geoLongitude), ST_GEOGPOINT(tt.latitude, tt.longitude), 250)
WHERE tt.customer_id= 204
AND t.timestamp > "2016-01-01"
AND tt.latitude <= 90 AND tt.latitude >= -90
Upvotes: 3