Reputation: 6544
I have a bunch of strings that have various prefixes including "unknown:" I'd really like to filter out all the strings starting with "unknown:" in my Pig script, but it doesn't appear to work.
simpleFilter = FILTER records BY NOT(mystr MATCHES '^unknown');
I've tried a few other permutations of the regex, but it appears that MATCHES
just doesn't work well with NOT. Am I missing something?
Using Pig 0.9.2
Upvotes: 4
Views: 21835
Reputation: 3530
It's because the matches
operator operates exactly like Java's String#matches
, i.e. it tries to match the entire String and not just part of it (the prefix in your case). Just update your regular expression to match the the entire string with your specified prefix, like so:
simpleFilter = FILTER records BY NOT(mystr MATCHES '^unknown.*');
Upvotes: 19