xinekiv680
xinekiv680

Reputation: 1

Apache Pig Group by and Filter if multiple values exist?

I am trying to group multiple rows with the same IDs, and then check for each tuple in the group if it contains both values, for example:

(10461 , 55 )
(10435 , 17 )
(10435 , 11 )
(10435 , 72 )
(10437 , 11 )
(10830 , 72 )

After I group it via: groupedData = group dataPoints by data_id;

I get :

(10461 ,{(10461 , 55)})


(10435 ,{(10435 , 17),(10435 , 11),(10435 , 72)})

I want to filter and get the value of 10435 if it contains 17 and 11.

Upvotes: 0

Views: 238

Answers (1)

pauljcg
pauljcg

Reputation: 193

You can use a nested FOREACH to filter the bags, and then check for empty bags. Note I'm not sure what you've called fields with the numbers (55, 17, 11 etc.) so this is value in the code below - replace as needed!

filteredBags = FOREACH groupedData {
    seventeen = FILTER dataPoints BY value == 17;
    eleven = FILTER dataPoints BY value == 11;
    GENERATE
    group AS data_id,
    seventeen,
    eleven;
}

nonNullBags = FILTER filteredBags BY NOT IsEmpty(seventeen) AND NOT IsEmpty(eleven);

finalIds = FOREACH nonNullBags GENERATE data_id;

Upvotes: 0

Related Questions