John
John

Reputation: 59

Finding strings between dashes using REGEXP_EXTRACT in Bigquery

In Bigquery, I am trying to find a way to extract particular segments of a string based on how many dashes come before it. The number of total dashes in the string will always be the same. For example, I could be looking for the string after the second dash and before the third dash in the following string:

abc-defgh-hij-kl-mnop

Currently, I am using the following regex to extract, which counts the dashes from the back:

([^-]+)(?:-[^-]+){2}$

The problem is that if there is nothing in between the dashes, the regex doesn't work. For example, something like this returns null:

abc-defgh-hij--mnop

Is there a way to use regex to extract a string after a certain number of dashes and cut it off before the subsequent dash?

Thank you!

Upvotes: 1

Views: 1704

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173210

Below is for BigQuery Standrd SQL

The simplest way in your case is to use SPLIT and OFFSET as in below example

SELECT SPLIT(str, '-')[OFFSET(3)]   

above will return empty string for abc-defgh-hij--mnop

to prevent error in case of calling non-existing element - better to use SAFE_OFFSET

SELECT SPLIT(str, '-')[SAFE_OFFSET(3)]   

Upvotes: 3

Related Questions