foxwendy
foxwendy

Reputation: 2929

Google BigQuery - regex functions for utf-8

SELECT  REGEXP_EXTRACT('name=北海道新幹線|gauge=1435|layer=2|usage=main|bridge=yes|', r'layer=(-?[0-9]+)|') AS Layer

yield null. while

select  REGEXP_EXTRACT('layer=2|',r'layer=(-?[0-9]+)|') as layer

yield the wanted result : 2

I suspect the first one failed due to utf-8 characters?

How to extract layer info for the first string?

Upvotes: 0

Views: 289

Answers (1)

Elliott Brossard
Elliott Brossard

Reputation: 33745

The trailing | is interpreted as OR. You need to escape it with a backslash, e.g.:

SELECT  REGEXP_EXTRACT('name=北海道新幹線|gauge=1435|layer=2|usage=main|bridge=yes|', r'layer=(-?[0-9]+)\|') AS Layer

Without the backslash, the second example that you have works because the regex is interpreted as the pattern OR the empty string.

Upvotes: 1

Related Questions