YuliaPro
YuliaPro

Reputation: 305

USING Filter in a Nested FOREACH in PIG

I have two pig relations. The first one count_pairs shows pairs of words and how many times they were seen. ex ((car,tire), 4). The second is word_counts, which keeps track of how many times each word was seen ex. (car, 20). I would like to find the percentage of how many times each pair was seen compared to how many times just the first word was seen. In our case I would want ((car,tire), 4/20). I tried to write a nested foreach to solve this problem :

> percent_count_pairs = FOREACH count_pairs {
> denom = FILTER word_counts BY ($0 ==count_pairs.pair.word1);
> GENERATE pair, count2/(double)denom.$1;}

I keep getting this error:

'Pig script failed to parse: 
<file src/cluster.pig, line 27, column 15> expression is not a project expression: (Name: ScalarExpression) Type: null Uid: null)'

This point to the line with the FILTER; googling this error did not lead me to anything helpful. Please help! (ps. this does work if I take the line with FILTER out of the foreach...)

Upvotes: 2

Views: 9329

Answers (1)

YuliaPro
YuliaPro

Reputation: 305

After more googling I came to realize that this is a bug in Pig that will not allow this: https://issues.apache.org/jira/browse/PIG-1798. I ended up writing my own UDF to filter.

Upvotes: 2

Related Questions