Thissiteusescookies
Thissiteusescookies

Reputation: 133

How extract only alphanumeric characters from string? (SQL Google BigQuery)

Say I have a column called merchants containing these values:

Al's Coffee  
Belinda & Mark Bakery  
Noodle Shop 38  

How can i get it to extract:

alscoffee  
belindamarkbakery  
noodleshop38  

So far I've been using replace(lower(merchants), '&', '') for every individual character, but is there any way to avoid using so many nested replaces?

Upvotes: 2

Views: 12856

Answers (4)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173003

How extract only alphanumeric characters from string?

Below is for BigQuery Standard SQL

You can also use \W - not word characters (≡ [^0-9A-Za-z_]) as in example below

REGEXP_REPLACE(merchant, r'\W', '')   

If to apply to sample data from your question

#standardSQL
WITH `project.dataset.table` AS (
  SELECT "Al's Coffee" merchant UNION ALL
  SELECT "Belinda & Mark Bakery" UNION ALL
  SELECT "Noodle Shop 38" 
)
SELECT REGEXP_REPLACE(merchant, r'\W', '') cleaned_merchant
FROM `project.dataset.table`   

result is

Row cleaned_merchant     
1   AlsCoffee    
2   BelindaMarkBakery    
3   NoodleShop38       

Obviously, if you need result in lower case - add respective function - for example

SELECT REGEXP_REPLACE(LOWER(merchant), r'\W', '') cleaned_merchant
FROM `project.dataset.table` 

Upvotes: 3

Thissiteusescookies
Thissiteusescookies

Reputation: 133

what worked for me in BigQuery was

SELECT regexp_replace(lower(Al's Coffee), "[^0-9a-zA-z]+","")

this would return alscoffee

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269933

In BigQuery, you can use regexp_replace():

select regexp_replace(merchant, '[^a-zA-Z0-9]', '')

This is saying to replace anything that is not a character or digit with an empty string. If you want to keep more characters, add them to the character class.

Upvotes: 3

Muhammad Usman
Muhammad Usman

Reputation: 16

I have a solution for you !!

SELECT merchants AS Original, LOWER(merchants) AS Lowercase FROM TableName;

Lowercase column display the required results as you want !!

Upvotes: 0

Related Questions