Ravindra
Ravindra

Reputation: 2271

Comparing substrings in Google BigQuery

I want to Query the two table depending on the condition and want to generate the Tag as you can see in the image yes/maybe for generating the following result the query is

For displaying the above result i am using if(places.name CONTAINS poi.name 'Yes','Maybe') but the problem is : in the image on line no. 4 poi_name contain the value surana.agen and its respective column places_type have the value [W1]Surana.Agency and hence i want the Tag as Yes instead of Maybe. poi_name column can have the any special character like here dot(.) i want to split the columns values by whichever the special character present and in this case i want to search for the surana or agen present in the places_name.

Any help appreciable Thanks

Upvotes: 2

Views: 2098

Answers (3)

Ravindra
Ravindra

Reputation: 2271

My Query for the above image ios

SELECT a.id as ID, 
a.lattitude as poi_lat,
b.latitude as places_lat,
a.longitude as poi_long,
b.longitude as places_long,
a.name as poi_name,
a.comment as poi_comment,
b.name as places_name,
b.type as places_type,
if(b.name CONTAINS a.name 
AND b.lat_approximate = a.lat_approximate 
AND b.long_approximate = a.long_approximate, 
'Yes', 
if(b.lat_approximate = a.lat_approximate 
AND b.long_approximate = a.long_approximate, 'Maybe','No') 
) as Tag
FROM [grayrout_graycover.poientry_test] a
LEFT OUTER JOIN EACH [grayrout_graycover.placesdetails] b
ON a.lat_approximate = b.lat_approximate AND a.long_approximate = b.long_approximate 
WHERE b.type='P&G'
ORDER BY a.id;

Upvotes: 0

Ravindra
Ravindra

Reputation: 2271

I got the expected result for the question above by changing the condition in the select clause as if(UPPER(b.name) CONTAINS UPPER(REGEXP_REPLACE(a.name, r'\.\w*', ''))OR UPPER(b.name) CONTAINS UPPER(a.name) AND b.lat_approximate = a.lat_approximate AND b.long_approximate = a.long_approximate AND b.type='P&G', 'Yes', if(b.lat_approximate = a.lat_approximate AND b.long_approximate = a.long_approximate AND b.type='P&G', 'May be','Not') ) as Tag but it gives annoying result as bellow enter image description here

line no 246 which is Ok but line no 241 giving wrong result the Tag must be Yes. after getting this result I want to optimize it for line no 242 and 243.

Upvotes: 0

Jeremy Condit
Jeremy Condit

Reputation: 7046

At a high level, I'd suggest thinking about this problem as follows:

  • Step 1: Split poi_name into the substrings that you want to match.
  • Step 2: Check whether any of those substrings are contained in places_name.

For Step 1, it's hard to pull apart an arbitrary number of substrings in SQL. However, if you have some limit in mind (e.g., at most 3 substrings), you could pull them out using REGEXP_EXTRACT. For example:

 REGEXP_EXTRACT(poi_name, r'([^.]*)') AS first,
 REGEXP_EXTRACT(poi_name, r'[^.]*\.([^.]*)') AS second,
 REGEXP_EXTRACT(poi_name, r'[^.]*\.[^.]*\.([^.]*)') AS third,

Fortunately, Step 2 is easy. Once you've extracted those strings, use CONTAINS to test whether they're in places_name.

I'm not sure how much control you have over your input, but you might consider splitting poi_name into separate fields or into a repeated field in advance, since that would avoid the need to run REGEXP_EXTRACT on every query.

Upvotes: 1

Related Questions