user3575226
user3575226

Reputation: 121

top-K values for each Group using mapreduce

I want to write a map-reduce algorithm for finding top N values ( A or D order) for each Group

Input data
a,1
a,9
b,3
b,5
a,4
a,7
b,1
c,1
c,9
c,-2
d,1
b,1
a,10
1,19 

output type 1
a 1,4,7,9 ,10 , 19
b 1,,1,3,5
c -2,1,9
d 1

output type 2
a 19, 10 , 9,7,4,1
b 5,3,1,1
c 9,1,-2
d 1

output type 1 for top 3

a 1,4,7
b 1,,1,3
c -2,1
d 1

Please guide me

Upvotes: 0

Views: 1193

Answers (2)

samthebest
samthebest

Reputation: 31515

In Scalding (assuming data in tsv) it would be something like

Tsv(path, ('key, 'value)).groupBy('key)(_.sortWithTake('value -> 'value, N))
.write(Tsv(outputPath))

Upvotes: 0

Aleksei Shestakov
Aleksei Shestakov

Reputation: 2538

You need to write a mapper that will split the input line by comma and produce a pair of Text, IntWritable:

Text('a,1') -> (mapper) -> Text('a'), IntWritable(1)

In reducer you will have the group and the list of values. You need to select the top K values from the list with priority queue:

// add all values to priority queue
PriorityQueue<Integer> queue = new PriorityQueue<Integer>();
for (IntWritable value : values)
    queue.add(value.get());

// get first K elements from priority queue
String topK = String.valueOf(queue.poll());
for (int i = 0; i < K - 1; ++i)
    topK += ", " + queue.poll();

Upvotes: 2

Related Questions