Reputation: 133
Say I have a column called merchants containing these values:
Al's Coffee
Belinda & Mark Bakery
Noodle Shop 38
How can i get it to extract:
alscoffee
belindamarkbakery
noodleshop38
So far I've been using replace(lower(merchants), '&', '')
for every individual character, but is there any way to avoid using so many nested replaces?
Upvotes: 2
Views: 12856
Reputation: 173003
How extract only alphanumeric characters from string?
Below is for BigQuery Standard SQL
You can also use \W - not word characters (≡ [^0-9A-Za-z_])
as in example below
REGEXP_REPLACE(merchant, r'\W', '')
If to apply to sample data from your question
#standardSQL
WITH `project.dataset.table` AS (
SELECT "Al's Coffee" merchant UNION ALL
SELECT "Belinda & Mark Bakery" UNION ALL
SELECT "Noodle Shop 38"
)
SELECT REGEXP_REPLACE(merchant, r'\W', '') cleaned_merchant
FROM `project.dataset.table`
result is
Row cleaned_merchant
1 AlsCoffee
2 BelindaMarkBakery
3 NoodleShop38
Obviously, if you need result in lower case - add respective function - for example
SELECT REGEXP_REPLACE(LOWER(merchant), r'\W', '') cleaned_merchant
FROM `project.dataset.table`
Upvotes: 3
Reputation: 133
what worked for me in BigQuery was
SELECT regexp_replace(lower(Al's Coffee), "[^0-9a-zA-z]+","")
this would return alscoffee
Upvotes: 0
Reputation: 1269933
In BigQuery, you can use regexp_replace()
:
select regexp_replace(merchant, '[^a-zA-Z0-9]', '')
This is saying to replace anything that is not a character or digit with an empty string. If you want to keep more characters, add them to the character class.
Upvotes: 3
Reputation: 16
I have a solution for you !!
SELECT merchants AS Original, LOWER(merchants) AS Lowercase FROM TableName;
Lowercase column display the required results as you want !!
Upvotes: 0