Helmut Januschka
Helmut Januschka

Reputation: 1636

mongo db aggregation/mapreduce

i am trying to extract uniq keys + all uniq values from an object of my mongo-documents.

lets say i have document structure like:

{
    "userId": "1234",
    "formFields": {
        "field1": "value1",
        "field2": "value2"
    }
},
{
    "userId": "1234",
    "formFields": {
        "field3": "value3",
        "field1": "value1-edited"
    }
},
{
    "userId": "1234",
    "formFields": {
        "field3": "value3",
        "field1": "value1-edited"
    }
}

i want to aggregate all documents from user "1234" to get the distinct values of "formFields"

result should look something like this:

{
    "_id": "1234",
    "formFields": {
        "field1": [
            "value1",
            "value1-edited"
        ],
        "field2": [
            "value2"
        ],
        "field3": [
            "field3"
        ],
        "field4": [
            "field4"
        ]
    }
}

they keys in the formFields are dynamic. i tried arround with aggregate, and mapreduce but haven't found any working sample that can be used as a basis.

anyone can answer this?

THANKS

regards helmut

Upvotes: 0

Views: 70

Answers (1)

chridam
chridam

Reputation: 103365

Since you have dynamic keys that you don't know beforehand, you would need to get a list of those fields so that you can use them in your aggregation pipeline. One way to get that list is through mapReduce. The following demonstrates this approach. In the Map-Reduce operation, an array of keys in the formFields subdocument is generated to an output collection "uploads_keys" and then used to produce the aggregation pipeline expressions:

var mr = db.runCommand({
    "mapreduce" : "uploads",
    "map" : function() {
        for (var key in this.formFields) { emit(key, null); }
    },
    "reduce" : function(key, stuff) { 
        return null  
    }, 
    "out": "uploads" + "_keys"
});

var groupKeys = {},
    projectKeys = { "formFields": {} };
db[mr.result].distinct("_id").forEach(function (key){    
    groupKeys[key] = { "$push": "$formFields." + key };
    projectKeys["formFields"][key] = "$" + key
});

groupKeys["_id"] = "$userId";

> printjson(groupKeys);
{
        "field1" : {
                "$push" : "$formFields.field1"
        },
        "field2" : {
                "$push" : "$formFields.field2"
        },
        "field3" : {
                "$push" : "$formFields.field3"
        },
        "_id" : "$userId"
}
> printjson(projectKeys);
{
        "formFields" : {
                "field1" : "$field1",
                "field2" : "$field2",
                "field3" : "$field3"
        }
}

You can then use those variables in your aggregation pipeline:

db.uploads.aggregate([
    { "$group": groupKeys },
    { "$project": projectKeys }
])

Upvotes: 1

Related Questions