How to calculate Centered Moving Average of a set of data in Hadoop Map-Reduce?

Question

I want to calculate Centered Moving average of a set of Data .

Example Input format :

quarter | sales      
Q1'11   | 9            
Q2'11   | 8
Q3'11   | 9
Q4'11   | 12
Q1'12   | 9
Q2'12   | 12
Q3'12   | 9
Q4'12   | 10

Mathematical Representation of data and calculation of Moving average and then centered moving average

Period   Value   MA  Centered
1          9
1.5
2          8
2.5              9.5
3          9            9.5
3.5              9.5
4          12           10.0
4.5              10.5
5          9            10.750
5.5              11.0
6          12
6.5
7          9

I am stuck with the implementation of RecordReader which will provide mapper sales value of a year i.e. of four quarter.

Joe K · Accepted Answer

This is actually totally doable in the MapReduce paradigm; it does not have to be though of as a 'sliding window'. Instead think of the fact that each data point is relevant to a max of four MA calculations, and remember that each call to the map function can emit more than one key-value pair. Here is pseudo-code:

First MR job:

map(quarter, sales)
    emit(quarter - 1.5, sales)
    emit(quarter - 0.5, sales)
    emit(quarter + 0.5, sales)
    emit(quarter + 1.5, sales)

reduce(quarter, list_of_sales)
    if (list_of_sales.length == 4):
        emit(quarter, average(list_of_sales))
    endif


Second MR job:

map(quarter, MA)
    emit(quarter - 0.5, MA)
    emit(quarter + 0.5, MA)

reduce(quarter, list_of_MA)
    if (list_of_MA.length == 2):
        emit(quarter, average(list_of_sales))
    endif

How to calculate Centered Moving Average of a set of data in Hadoop Map-Reduce?

Answers (2)

Related Questions