Sebastian
Sebastian

Reputation: 967

What's the equivalent of Excel's `left(find(), -1)` in BigQuery?

I have names in my dataset and they include parentheses. But, I am trying to clean up the names to exclude those parentheses.

Example: ABC Company (Somewhere, WY)

What I want to turn it into is: ABC Company

I'm using standard SQL with google big query.

I've done some research and I know big query has left(), but I do not know the equivalent of find(). My plan was to do something that finds the ( and then gives me everything to the left of -1 characters from the (.

Upvotes: 1

Views: 3808

Answers (4)

Graham Polley
Graham Polley

Reputation: 14781

Just use REGEXP_REPLACE + TRIM. This will work with all variants (just not nested parentheses):

#standardSQL
WITH
  names AS (
  SELECT
    'ABC Company  (Somewhere, WY)' AS name
  UNION ALL
  SELECT
    '(Somewhere, WY)  ABC Company' AS name
  UNION ALL
  SELECT
    'ABC (Somewhere, WY) Company' AS name)
SELECT
  TRIM(REGEXP_REPLACE(name,r'\(.*?\)',''), ' ') AS cleaned
FROM
  names

enter image description here

Upvotes: 3

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

My plan was to do something that finds the ( and then gives me everything to the left of -1 characters from the (.

Good plan! In BigQuery Standard SQL - equivalent of LEFT is SUBSTR(value, position[, length]) and equivalent of FIND is STRPOS(value1, value2)

With this in mind your query can look like (which is exactly as you planned)

#standardSQL
WITH names AS (
  SELECT 'ABC Company (Somewhere, WY)' AS name 
)
SELECT SUBSTR(name, 1, STRPOS(name, '(') - 1) AS clean_name 
FROM names   

Usually, string functions are less expensive than regular expression functions, so if you have pattern as in your example - you should go with above version

But in more generic cases, when pattern to clean is more dynamic like in Graham's answer - you should go with solution in Graham's answer

Upvotes: 3

Lamar
Lamar

Reputation: 1839

I can't test this solution at the moment, but you can combine SUBSTR and INSTR. Like this:

SELECT CASE WHEN INSTR(name, '(') > 0 THEN SUBSTR( name, 1, INSTR(name, '(') ) ELSE name END as name FROM table;

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520908

Use REGEXP_EXTRACT:

SELECT
    RTRIM(REGEXP_EXTRACT(names, r'([^(]*)')) AS new_name
FROM yourTable

The regex used here will greedily consume and match everything up until hitting an opening parenthesis. I used RTRIM to remove any unwanted whitespace picked up by the regex.

Note that this approach is robust with respect to the edge case of an address record not having any term with parentheses. In this case, the above query would just return the entire original value.

Upvotes: 1

Related Questions