Reputation: 707
I have a string field error_cd with the value "cntrlb cntrlb asdv cntrlb asvd cntrla cntrlb cntrlb"
Within PIG, I'm trying to use REGEX_EXTRACT_ALL(error_cd, '.*(cntrl[a-b]).*')
to get back a tuple of (cntrlb,cntrlb,cntrlb,cntrla,cntrlb)
or just (cntrl,cntrl,...,cntrl)
. Instead, I'm getting back just one match (cntrl)
.
Anybody know how to return all of the matches, as the function name implies?
Upvotes: 0
Views: 354
Reputation: 5801
REGEX_EXTRACT_ALL
is for extracting all of the capturing groups in a regular expression. It does not apply a single regular expression multiple times. This document is somewhat out of date, but it still is accurate for REGEX_EXTRACT_ALL
.
There is no regular expression that can capture an arbitrary number of groups. (See this question.) If you had a known limit of times your cntrl
string could occur, you could design a big ugly regex to capture them all, but it sounds like you'd be better off using TOKENIZE
and then treating each word in your string individually.
Upvotes: 1