Tapan Avasthi
Tapan Avasthi

Reputation: 425

Filtering in Pig

I am trying to do a filter for a relation in pig, I need all those records in which there is an occurrence of third field in the first field string.

I tried with: (Assume my source relation is SRC)

Filtered= FILTER SRC BY $0 matches 'CONCAT(".*",$2,".")';
DUMP Filtered;

There is no syntax error but I am not getting any output for Filtered.

Upvotes: 2

Views: 1341

Answers (4)

Rengasamy
Rengasamy

Reputation: 1043

Try this,

Filtered= FILTER SRC BY $0 matches '(.*)$2(.*)';

DUMP Filtered;

If the third field contains the first field then that results will be filtered. This is done by using Regex.

Upvotes: 0

MendicantB
MendicantB

Reputation: 11

I think you just have some syntax errors

  • As noted by A. Leistra, CONCAT only takes two arguments.
  • "." at the end should be ".*" if you want double sided wildcards
  • FILTER statement prefers parenthesis around the argument
  • Pig has a lot of weird edge cases involving double quotes, so just use single when you can

Filtered= FILTER SRC BY ($0 matches CONCAT('.*', CONCAT($2, '.*')));

Upvotes: 1

A. Leistra
A. Leistra

Reputation: 41

Pig's CONCAT takes only two arguments. See the documentation at http://pig.apache.org/docs/r0.10.0/func.html#concat

I'm not sure why it isn't complaining at runtime, but you're going to want to string together two CONCAT statements, like

CONCAT(".*", CONCAT($2, "."))

to get the string you want.

Upvotes: 4

Chris White
Chris White

Reputation: 30089

I don't think that the CONCAT is resolving to what you're expecting, more so the matches is probably trying to match the entire unevalutated string CONCAT(".*",$2,"."), which is why you are not getting any results

Can you break this out into two statements, the first where you create a field containing the evalulated content of the CONCAT, and a second to perform the matches operation:

TMP = FOREACH SRC GENERATE $0, CONCAT(".*",$2,".");
Filtered = FILTER TMP BY $0 matches $1;
DUMP Filtered;

Or something like that (completely untested)

Upvotes: 2

Related Questions