itsPav
itsPav

Reputation: 612

Bigquery SQL nested regex_replace simplification

Currently normalizing the address column in BQ using a lot of regex_replace functions that each have a unique match and replacement value. Example;

regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace
            (regexp_replace(regexp_replace(regexp_replace(regexp_replace(UPPER(address),
"STREET", "ST"),
"ROAD", "RD"),
...

Is there a simpler way of doing this?

Upvotes: 1

Views: 480

Answers (1)

Samuel
Samuel

Reputation: 3528

You can clean up your query by using a UDF:

Create temp function replace_me(s string,a1 string,a2 string,b1 string,b2 string,c1 string,c2 string) as 
(replace(replace(replace(s,a1,a2),b1,b2),c1,c2));
select replace_me(replace_me(upper("Road 5, Street 7"),"ROAD","RD","STREET","ST","",""),"","","","","","")

In case you need to abbreviate only word, a regex can be used:

SELECT 
  REGEXP_REPLACE("ROAD 7, STREET 5, NEW YORK", CONCAT(r'(?i)(RO)AD|(ST)REET|(N)EW (Y)ORK'), r'\1\2\3\4')

Upvotes: 1

Related Questions