Alaa Qutaish
Alaa Qutaish

Reputation: 239

MapReduce misconception.

I am writing my second mapReduce to get the top ten songs played for every user for the last week from a collection that contains "activity" nested document that has an array of song_id, counter and date. Counter means the "play times" of the song.

I tried to use mapReduce and I was able to accomplish this task and output the needed results using only "map" without the need to reduce the emitted values. Is this a wrong approach I am using? what is the best approach of doing this.

Here is the map function:

var map = function() {
user_top_songs = [];
user_songs = [];
limit = 10;
if(this.activities !== undefined){
        key = {user_id:this.id};
        for (var i=0; i < this.activities.songs.length; i++){
            if (this.activities.songs !== undefined  && this.activities.songs[i].date.getDate() > (new Date().getDate()-7))
                user_songs.push([this.activities.songs[i].song_id, this.activities.songs[i].counter]);
        }
        if(user_songs.length !== 0){
            user_songs.sort(function(a,b){return b[1]-a[1]});
            if(user_songs.length < 10 )
                limit = user_songs.length;
            for(var j=0; j < limit; j++)
                user_top_songs.push(user_songs[j]);
        }
        value = {songs:user_top_songs};
        emit(key,value);
    }
}

Here is the empty reduce method:

var reduce = function(key, values) {};

Upvotes: 1

Views: 295

Answers (1)

Eric Alberson
Eric Alberson

Reputation: 1116

You shouldn't need a reduce function. Based on the input data it won't be necessary, and I'll explain why.

To recall in a simplified manner, in MapReduce the mapper function takes the input and splits it up by key then passes the (key,value) pairs to the reducer. The reducer then aggregates the (key, [list of values]) pairs into some useful output.

In your case, the key is the user ID, and the value is top 10 songs they listened to. Just by the way the data is laid out, it is already organized into (key,[list of values]) pairs. You already have the key with the list of every value that is associated with it following it. The user ID is listed with every song they listenend to right after it, so there is no need to reduce.

Basically, the reduce step would be combining each (user ID, song) pair into a list of the user's songs. But that's already been done. It's inherent in the data. So, in this specific case, the mapper is the only necessary function to accomplish what you need in this case.

Upvotes: 3

Related Questions