Reputation: 306
I'm trying to calculate percentile using Pig. I need to group data using an attribute and calculate percentiles for each tuple in the group based on sales.
I've seen there is no built in Pig function to do this. Wondering if anyone faced similar problem before can help me.
Upvotes: 3
Views: 2728
Reputation: 1855
As JaiPrakash mentioned, you can use the UDF StreamingQuantile from the Apache DataFu library. Since I already have an example ready, I'll just copy it here.
Input
item1,234
item1,324
item1,769
item2,23
item2,23
item2,45
PIG Script
register datafu-1.2.0.jar;
define Quantile datafu.pig.stats.StreamingQuantile('0.0','0.5','1.0');
data = load 'data' using PigStorage(',') as (item:chararray, value:int);
quantiles = FOREACH (GROUP data by item) GENERATE group, Quantile(data.value);
dump quantiles;
Output
(item1,(234.0,324.0,769.0))
(item2,(23.0,23.0,45.0))
Upvotes: 6