Reputation: 121
I want to write a map-reduce algorithm for finding top N values ( A or D order) for each Group
Input data
a,1
a,9
b,3
b,5
a,4
a,7
b,1
c,1
c,9
c,-2
d,1
b,1
a,10
1,19
output type 1
a 1,4,7,9 ,10 , 19
b 1,,1,3,5
c -2,1,9
d 1
output type 2
a 19, 10 , 9,7,4,1
b 5,3,1,1
c 9,1,-2
d 1
output type 1 for top 3
a 1,4,7
b 1,,1,3
c -2,1
d 1
Please guide me
Upvotes: 0
Views: 1193
Reputation: 31515
In Scalding (assuming data in tsv) it would be something like
Tsv(path, ('key, 'value)).groupBy('key)(_.sortWithTake('value -> 'value, N))
.write(Tsv(outputPath))
Upvotes: 0
Reputation: 2538
You need to write a mapper that will split the input line by comma and produce a pair of Text
, IntWritable
:
Text('a,1') -> (mapper) -> Text('a'), IntWritable(1)
In reducer you will have the group and the list of values. You need to select the top K
values from the list with priority queue:
// add all values to priority queue
PriorityQueue<Integer> queue = new PriorityQueue<Integer>();
for (IntWritable value : values)
queue.add(value.get());
// get first K elements from priority queue
String topK = String.valueOf(queue.poll());
for (int i = 0; i < K - 1; ++i)
topK += ", " + queue.poll();
Upvotes: 2