Regressor
Regressor

Reputation: 1973

How to parse email addresses in Google BigQuery

I have a table as below in BigQuery -

with temp as (
select "[email protected]" as email_id
union all 
select "[email protected]" as email_id
)
select * from temp

I would like to generate 3 new columns (firstname, lastname, company) from the email_id field such that the output would be -

firstname, lastname, company
john         doe      company-Y
hello        world    company-X

Which BigQuery function can be used for this?

Upvotes: 1

Views: 1426

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173210

Below is for BigQuery Standard SQL

There are really too many ways of doing this and below is quick (first came in mind) one:

#standardSQL
WITH temp AS (
  SELECT "[email protected]" AS email_id UNION ALL 
  SELECT "[email protected]" AS email_id
)
SELECT 
  SPLIT(SPLIT(email_id, '@')[SAFE_OFFSET(0)], '.')[SAFE_OFFSET(0)] firstname,
  SPLIT(SPLIT(email_id, '@')[SAFE_OFFSET(0)], '.')[SAFE_OFFSET(1)] lastname,
  SPLIT(SPLIT(email_id, '@')[SAFE_OFFSET(1)], '.')[SAFE_OFFSET(0)] company
FROM temp    

with result

Row firstname   lastname    company  
1   john        doe         company-Y    
2   hello       world       company-X      

But really right solution will depend on nature and pattern your data has and obviously personal preferences, etc.

Another quick option would be

#standardSQL
WITH temp AS (
  SELECT "[email protected]" AS email_id UNION ALL 
  SELECT "[email protected]" AS email_id 
)
SELECT 
  REGEXP_EXTRACT(email_id, r'^(.*?)[.@]') firstname,
  REGEXP_EXTRACT(email_id, r'\.(.*?)@') lastname,
  REGEXP_EXTRACT(email_id, r'@(.*?)\.') company
FROM temp

with same result

Just to extend a little - so you see direction for improvements - for example if names are separated bu . or -

#standardSQL
WITH temp AS (
  SELECT "[email protected]" AS email_id UNION ALL 
  SELECT "[email protected]" AS email_id UNION ALL
  SELECT "[email protected]" AS email_id UNION ALL
  SELECT "[email protected]" AS email_id
)
SELECT email_id,
  REGEXP_EXTRACT(email_id, r'^(.*?)[-.@]') firstname,
  REGEXP_EXTRACT(email_id, r'[-.](.*?)@') lastname,
  REGEXP_EXTRACT(email_id, r'@(.*?)\.') company
FROM temp

with result

Row email_id                    firstname   lastname    company  
1   [email protected]      john        doe         company-Y    
2   [email protected]   hello       world       company-X    
3   [email protected]   hello       world       company-X    
4   [email protected]         hello       null        company-X    

Upvotes: 3

Related Questions