Newtang
Newtang

Reputation: 6544

Pig Filter out NOT Matches

I have a bunch of strings that have various prefixes including "unknown:" I'd really like to filter out all the strings starting with "unknown:" in my Pig script, but it doesn't appear to work.

simpleFilter = FILTER records BY NOT(mystr MATCHES '^unknown');

I've tried a few other permutations of the regex, but it appears that MATCHES just doesn't work well with NOT. Am I missing something?

Using Pig 0.9.2

Upvotes: 4

Views: 21835

Answers (1)

jkovacs
jkovacs

Reputation: 3530

It's because the matches operator operates exactly like Java's String#matches, i.e. it tries to match the entire String and not just part of it (the prefix in your case). Just update your regular expression to match the the entire string with your specified prefix, like so:

simpleFilter = FILTER records BY NOT(mystr MATCHES '^unknown.*');

Upvotes: 19

Related Questions