faboys
faboys

Reputation: 57

How to find top 10 elements in MapReduce

I am trying to write a Python MapReduce job on some datasets I have to find certain statistics. This is a example of the input data and the form it comes in:

exchange, stock_symbol, date, stock_price_open,stock_price_high,stock_price_low, stock_price_close, stock_volume,stock_price_adj_close.

I need to use the find the top 10 days on which the most stock was traded which is calculated from: stock_price_close * stock_volume

Here is the code I have right now:

from mrjob.job import MRJob

class MapReduce(MRJob):

    def mapper(self, _, line):
        values = line.split(',')
        amount = int(float(values[6]) * float(values[7]))
        code = values[1]
        date = values[2]
        list = (code, date, amount)
        yield(None, list)

if __name__ == '__main__':
    MapReduce.run()

I'm having trouble implementing a Reducer method for this job however, and not sure how the Reducer will work and find the top 10 elements only. Can anyone help me out here?

Upvotes: 0

Views: 1258

Answers (1)

btilly
btilly

Reputation: 46497

Make this a multi-step job. The end result of the first step is per day, the total amount traded. The second gets the totals, sorts them, and returns the top 10.

Upvotes: 2

Related Questions