Reputation: 57
I am trying to write a Python MapReduce job on some datasets I have to find certain statistics. This is a example of the input data and the form it comes in:
exchange, stock_symbol, date, stock_price_open,stock_price_high,stock_price_low, stock_price_close, stock_volume,stock_price_adj_close.
I need to use the find the top 10 days on which the most stock was traded which is calculated from: stock_price_close * stock_volume
Here is the code I have right now:
from mrjob.job import MRJob
class MapReduce(MRJob):
def mapper(self, _, line):
values = line.split(',')
amount = int(float(values[6]) * float(values[7]))
code = values[1]
date = values[2]
list = (code, date, amount)
yield(None, list)
if __name__ == '__main__':
MapReduce.run()
I'm having trouble implementing a Reducer method for this job however, and not sure how the Reducer will work and find the top 10 elements only. Can anyone help me out here?
Upvotes: 0
Views: 1258
Reputation: 46497
Make this a multi-step job. The end result of the first step is per day, the total amount traded. The second gets the totals, sorts them, and returns the top 10.
Upvotes: 2