user1152532
user1152532

Reputation: 707

Trying to get multiple matches out of a string with REGEX_EXTRACT_ALL()

I have a string field error_cd with the value "cntrlb cntrlb asdv cntrlb asvd cntrla cntrlb cntrlb"

Within PIG, I'm trying to use REGEX_EXTRACT_ALL(error_cd, '.*(cntrl[a-b]).*') to get back a tuple of (cntrlb,cntrlb,cntrlb,cntrla,cntrlb) or just (cntrl,cntrl,...,cntrl). Instead, I'm getting back just one match (cntrl).

Anybody know how to return all of the matches, as the function name implies?

Upvotes: 0

Views: 354

Answers (1)

reo katoa
reo katoa

Reputation: 5801

REGEX_EXTRACT_ALL is for extracting all of the capturing groups in a regular expression. It does not apply a single regular expression multiple times. This document is somewhat out of date, but it still is accurate for REGEX_EXTRACT_ALL.

There is no regular expression that can capture an arbitrary number of groups. (See this question.) If you had a known limit of times your cntrl string could occur, you could design a big ugly regex to capture them all, but it sounds like you'd be better off using TOKENIZE and then treating each word in your string individually.

Upvotes: 1

Related Questions