Wolf
Wolf

Reputation: 576

Big Query Regex Extraction

I am trying to extract a item_subtype field from an URL.

This regex works fine in the to get the first item item_type

SELECT REGEXP_EXTRACT('info?item_type=icecream&item_subtype=chocolate/cookies%20cream,vanilla&page=1', r'item_type=(\w+)')

but what is the correct regex to get everything starting from 'chocolate' all the way to before the '&page1'

I have tried this, but can't seem to get it to work to go further

SELECT REGEXP_EXTRACT('info?item_type=icecream&item_subtype=chocolate/cookies%20cream,vanilla&page=1', r'item_subtype=(\w+[^Z])')

basically, I want to extract 'chocolate/cookies%20cream,vanilla'

Upvotes: 1

Views: 47

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626728

In your case, \w+ only matches one or more letters, digits or underscores. Your expected values may contain other characters, too.

You may use

SELECT REGEXP_EXTRACT('info?item_type=icecream&item_subtype=chocolate/cookies%20cream,vanilla&page=1', r'item_subtype=([^&]+)')

See the regex demo.

Notes:

  • item_subtype= - this string is matched as a literal char sequence
  • ([^&]+) - a Capturing group 1 that matches and captures one or more chars other than & into a separate memory buffer that is returned by REGEXP_EXTRACT function.

Upvotes: 1

Related Questions