Imaky
Imaky

Reputation: 1257

Left join with string function?

I need to left join two tables:

I did something like this:

SELECT
   CASE 
      WHEN b.domain IS NULL then "Invalid"
      ELSE "Valid"
   END as Validated
FROM Emails e
LEFT JOIN DomainBlacklist b
ON ENDS_WITH(LOWER(e.email), LOWER(b.domain))

But throws me an error:

"LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join."

Someone know how can I solve this?

Thanks!

Upvotes: 0

Views: 722

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Below is for BigQuery Standard SQL

#standardSQL
SELECT email, 
  IF(MAX(ENDS_WITH(LOWER(email), LOWER(domain))), 'invalid', 'valid') AS Validated
FROM `project.dataset.Emails`
CROSS JOIN `project.dataset.DomainBlacklist`
GROUP BY email 

You can test / play with above query using dummy data as below

#standardSQL
WITH `project.dataset.Emails` AS (
  SELECT email
  FROM UNNEST(['[email protected]','[email protected]','[email protected]','[email protected]']) AS email 
), `project.dataset.DomainBlacklist` AS (
  SELECT domain
  FROM UNNEST(['uvw.com','qwe.net']) AS domain
)
SELECT email, 
  IF(MAX(ENDS_WITH(LOWER(email), LOWER(domain))), 'invalid', 'valid') AS Validated
FROM `project.dataset.Emails`
CROSS JOIN `project.dataset.DomainBlacklist`
GROUP BY email 

result is

email           Validated    
[email protected]   valid    
[email protected]   valid    
[email protected]   invalid  
[email protected]   valid    

Upvotes: 2

Elliott Brossard
Elliott Brossard

Reputation: 33745

In theory it should be possible to express this as a join with an equality; you need to strip the @ from the email address first:

SELECT
   CASE 
      WHEN b.domain IS NULL then "Invalid"
      ELSE "Valid"
   END as Validated
FROM Emails e
LEFT JOIN DomainBlacklist b
ON LOWER(SPLIT(e.email, '@')[SAFE_OFFSET(1)]) = LOWER(b.domain)

Using sample data:

WITH Emails AS (
  SELECT '[email protected]' AS email UNION ALL
  SELECT '[email protected]' UNION ALL
  SELECT 'invalid_email' UNION ALL
  SELECT '[email protected]'
), DomainBlacklist AS (
  SELECT 'example.com' AS domain UNION ALL
  SELECT 'bar.com'
)
SELECT
   CASE 
      WHEN b.domain IS NULL then "Invalid"
      ELSE "Valid"
   END as Validated
FROM Emails e
LEFT JOIN DomainBlacklist b
ON LOWER(SPLIT(e.email, '@')[SAFE_OFFSET(1)]) = LOWER(b.domain)

Upvotes: 2

Related Questions