Eisyjul
Eisyjul

Reputation: 43

How to extract with REGEXP the specific combination in BigQuery

I have a long text, here is the part of it: "placement":1,"protocols":[2,3,5,6]},"secure":1

And I need to extract the list of protocols, so the result will be only [2,3,5,6].

I was using REGEXP_EXTRACT(text, r'"protocols":([^"]+)'), but the result is always different: sometimes it is only [2,3,5,6] and sometimes it takes more: [2,3,5,6]},

How to build my REGEXP so the result will be always only the list in brackets?

Upvotes: 1

Views: 115

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

You can use

REGEXP_EXTRACT(text, r'"protocols"\s*:\s*(\[[^][]+])')

See the regex demo

To get the contents of protocols without the brackets, move the grouping boundaries a bit inward:

REGEXP_EXTRACT(text, r'"protocols"\s*:\s*\[([^][]+)]')

See this regex demo.

Details

  • "protocols" - a literal text
  • \s*:\s* - a colon enclosed with zero or more whitespace
  • \[ - a [ char
  • [^][]+ - one or more chars other than [ and ]
  • ] - a ] char.

Upvotes: 1

Related Questions