angsty_robot
angsty_robot

Reputation: 315

Bigquery SQL SELECT WHERE field contains many words from other table

I have a table of student info with full names, and a list of first names.

full_names               first_names  
---------------         -------------
john adam smith          john
jane anna doe            jane
michael brown            bob

I want to select the students whose full names contain ANY of a long list of first names. For example, John Adam Smith contains the word john, I want to select that row. Michael Brown doesn't match any of names so I don't want it.

I could do this but it's crazy inefficient typing 300+ names.

SELECT names
FROM full_names
WHERE names CONTAIN 'john' 
  OR names CONTAIN 'jane'
  OR names CONTAIN 'bob'
...
(300 more rows)

What I've tried - Made the list of names into a second table first_names, and tried to select names from the first table that were in the second.

SELECT names
FROM full_names 
WHERE names CONTAINS 
    (SELECT names
    FROM first_names);

However, I get the error:

Error: ELEMENT can only be applied to result with 0 or 1 row.

Is there a better way?

Upvotes: 2

Views: 12534

Answers (3)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172984

Below is for BigQuery Standard SQL

#standardSQL
SELECT full_name
FROM `project.dataset.full_names` a
CROSS JOIN `project.dataset.first_names` b 
GROUP BY full_name
HAVING MAX(REGEXP_CONTAINS(full_name, name))   

You can test / play with above using dummy data from your questions as below

#standardSQL
WITH `full_names` AS (
  SELECT 'john adam smith' full_name UNION ALL
  SELECT 'jane anna doe' UNION ALL
  SELECT 'michael brown' 
), `first_names` AS (
  SELECT 'john' name UNION ALL
  SELECT 'jane' UNION ALL
  SELECT 'bob' 
)
SELECT full_name
FROM `full_names` a
CROSS JOIN `first_names` b 
GROUP BY full_name
HAVING MAX(REGEXP_CONTAINS(full_name, name))

result is

full_name    
---------
john adam smith  
jane anna doe    

More options:

#standardSQL
SELECT DISTINCT full_name
FROM `project.dataset.full_names` a
JOIN `project.dataset.first_names` b 
ON full_name LIKE CONCAT('%', name, '%')  

OR

SELECT DISTINCT full_name
FROM `project.dataset.full_names`, UNNEST(SPLIT(full_name, ' ')) part
JOIN `project.dataset.first_names` 
ON part = name

Upvotes: 3

G.Arima
G.Arima

Reputation: 1171

Try this out:

SELECT names
FROM 
full_names a
inner join
first_names b
on a.names like CONCAT('%', b.names, '%') ;

Let me know in case of any queries.

Upvotes: 1

Cookie Monster
Cookie Monster

Reputation: 646

Untested but try this: The column names are according to the columns in the snapshot

 Select * from full_names
    where  first_names IN
    ( Select full_names from full_names)

Upvotes: 0

Related Questions