codious
codious

Reputation: 3511

Map reduce storing 'Nan' while counting on large objects

Map:

function () { 
emit(this.thread, 
    {max_year:this.date.getFullYear(), 
     min_year:this.date.getFullYear(), 
     max_month:this.date.getMonth(), 
     min_month:this.date.getMonth(),count:1}); 

};

Reduce:
function (key, values) {
max_year= values[0].max_year;
min_year = values[0].min_year;
max_month= values[0].max_month;
min_month = values[0].min_month;
var sum = 0;
if (values.length > 1){
    for(i in values){
        if(values[i].max_year > max_year){
            max_year = values[i].max_year;
        };
        if(values[i].min_year < min_year){
            min_year = values[i].min_year;
        };
        if(values[i].max_month > max_month){
            max_month = values[i].max_month;
        };
        if(values[i].min_month < min_month){
            min_month = values[i].min_month;
        };
        sum+=values[i].count
    };
};

return {"max year":max_year, "min year":min_year, "max month":max_month, "min month":min_month, "No of posts": sum};
}
};

output:

{u'_id': u'Sujet  Top 5 TED POST', u'value': {u'No of posts': 8.0, u'min month': 0.0, u'max month': 6.0, u'max year': 2011.0, u'min year': 2010.0}}
{u'_id': u'Sujet  Top 5 des meilleurs guitaristes de lhistoire du Rock', u'value':       {u'No of posts': 42.0, u'min month': 2.0, u'max month': 10.0, u'max year': 2011.0, u'min year': 2009.0}}
{u'_id': u'Sujet  Top ALEJANDRO GONZALEZ INARRITU', u'value': {u'No of posts': 29.0, u'min month': 0.0, u'max month': 9.0, u'max year': 2011.0, u'min year': 2008.0}}
{u'_id': u'Sujet  Top ANDY et LARRY WACHOWSKY', u'value': {u'No of posts': 40.0, u'min month': 0.0, u'max month': 11.0, u'max year': 2011.0, u'min year': 2008.0}}
{u'_id': u'Sujet  Top BRYAN SINGER', u'value': {u'No of posts': 50.0, u'min month': 0.0, u'max month': 11.0, u'max year': 2011.0, u'min year': 2006.0}}
{u'_id': u'Sujet  Top Cinma 2010', u'value': {u'No of posts': nan, u'min month': None, u'max month': None, u'max year': None, u'min year': None}}
{u'_id': u'Sujet  Top Cinma 2011', u'value': {u'No of posts': nan, u'min month': None, u'max month': None, u'max year': None, u'min year': None}}

As you can see, for some of the field ("no of posts") it prints 'Nan' and non for other fields. This doesn't occur when I Map Reduce just to count the number of posts without trying to work on the timestamps. I also notice that, Nan is being printed when "no of posts" is large (around 1000 or so). Also, without the 'count' and 'sum' all the manipulations on max year, min year and month are good. Thank you.

Upvotes: 1

Views: 342

Answers (1)

dcrosta
dcrosta

Reputation: 26258

Your reduce function needs to return a value in the same format as the second argument to emit() -- because of the way MongoDB Map-Reduce works, the results of a reduce function may be passed in to reduce again. I suspect that this is where the nan and None are coming from. Specifically here, you just need to adjust the key names in the object you return from your reduce: for instance, rather than "max year" (in reduce) you should use max_year.

For more on writing correct map and reduce functions, see the MongoDB Map-Reduce documentation.

Upvotes: 2

Related Questions