RoyalSwish
RoyalSwish

Reputation: 1573

BigQuery - Alternative method to Positive Lookahead for RegExes

I've written a RegEx pattern that identifies alpha-characters that are immediately followed by a numeric character, with the intention that it would used in BigQuery's REGEXP_EXTRACT function.

Here's the pattern: ([A-Z]|[a-z])*(?=[0-9])

However, due to BigQuery's use of RE2 expression library, the Positive Lookahead function does not work. What's an alternative method of identifying the numeric character without including it in the extracted string/match?

Use case:

To extract the first 1 or 2 alpha-characters of a UK postcode, e.g.

Upvotes: 0

Views: 269

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627262

You can use

REGEXP_EXTRACT(col, '^[A-Za-z]+')

The ^[A-Za-z]+ regex matches

  • ^ - start of string
  • [A-Za-z]+ - one or more letters.

Also, if you MUST check for a digit right after the initial letters, you can use a

REGEXP_EXTRACT(col, '^([A-Za-z]+)[0-9]')

The ^([A-Za-z]+)[0-9] regex matches and captures into Group 1 the initial letters, and then just matches a digit (with [0-9]). The REGEXP_EXTRACT function returns the captured substring if there is a capturing group.

Upvotes: 1

Related Questions