Reputation: 425
I am trying to do a filter for a relation in pig, I need all those records in which there is an occurrence of third field in the first field string.
I tried with: (Assume my source relation is SRC)
Filtered= FILTER SRC BY $0 matches 'CONCAT(".*",$2,".")';
DUMP Filtered;
There is no syntax error but I am not getting any output for Filtered.
Upvotes: 2
Views: 1341
Reputation: 1043
Try this,
Filtered= FILTER SRC BY $0 matches '(.*)$2(.*)';
DUMP Filtered;
If the third field contains the first field then that results will be filtered. This is done by using Regex.
Upvotes: 0
Reputation: 11
I think you just have some syntax errors
Filtered= FILTER SRC BY ($0 matches CONCAT('.*', CONCAT($2, '.*')));
Upvotes: 1
Reputation: 41
Pig's CONCAT
takes only two arguments. See the documentation at http://pig.apache.org/docs/r0.10.0/func.html#concat
I'm not sure why it isn't complaining at runtime, but you're going to want to string together two CONCAT
statements, like
CONCAT(".*", CONCAT($2, "."))
to get the string you want.
Upvotes: 4
Reputation: 30089
I don't think that the CONCAT is resolving to what you're expecting, more so the matches is probably trying to match the entire unevalutated string CONCAT(".*",$2,".")
, which is why you are not getting any results
Can you break this out into two statements, the first where you create a field containing the evalulated content of the CONCAT, and a second to perform the matches operation:
TMP = FOREACH SRC GENERATE $0, CONCAT(".*",$2,".");
Filtered = FILTER TMP BY $0 matches $1;
DUMP Filtered;
Or something like that (completely untested)
Upvotes: 2