Reputation: 239
I am writing my second mapReduce to get the top ten songs played for every user for the last week from a collection that contains "activity" nested document that has an array of song_id, counter and date. Counter means the "play times" of the song.
I tried to use mapReduce and I was able to accomplish this task and output the needed results using only "map" without the need to reduce the emitted values. Is this a wrong approach I am using? what is the best approach of doing this.
var map = function() {
user_top_songs = [];
user_songs = [];
limit = 10;
if(this.activities !== undefined){
key = {user_id:this.id};
for (var i=0; i < this.activities.songs.length; i++){
if (this.activities.songs !== undefined && this.activities.songs[i].date.getDate() > (new Date().getDate()-7))
user_songs.push([this.activities.songs[i].song_id, this.activities.songs[i].counter]);
}
if(user_songs.length !== 0){
user_songs.sort(function(a,b){return b[1]-a[1]});
if(user_songs.length < 10 )
limit = user_songs.length;
for(var j=0; j < limit; j++)
user_top_songs.push(user_songs[j]);
}
value = {songs:user_top_songs};
emit(key,value);
}
}
var reduce = function(key, values) {};
Upvotes: 1
Views: 295
Reputation: 1116
You shouldn't need a reduce
function. Based on the input data it won't be necessary, and I'll explain why.
To recall in a simplified manner, in MapReduce the mapper function takes the input and splits it up by key then passes the (key,value)
pairs to the reducer. The reducer then aggregates the (key, [list of values])
pairs into some useful output.
In your case, the key
is the user ID, and the value is top 10 songs they listened to. Just by the way the data is laid out, it is already organized into (key,[list of values])
pairs. You already have the key with the list of every value that is associated with it following it. The user ID is listed with every song they listenend to right after it, so there is no need to reduce.
Basically, the reduce
step would be combining each (user ID, song)
pair into a list of the user's songs. But that's already been done. It's inherent in the data. So, in this specific case, the mapper is the only necessary function to accomplish what you need in this case.
Upvotes: 3