Kiruchi
Kiruchi

Reputation: 195

Distinct/Aggregation query Mongodb array, trim trailing space

I have a MongoDB collection which contains a colours array like :

myCollection :

{
_id : ...,
"colours" : [ 
    {
        "colourpercentage" : "42",
        "colourname" : "Blue"
    }, 
    {
        "colourpercentage" : "32",
        "colourname" : "Red"
    }, 
    {
        "colourpercentage" : "10",
        "colourname" : "Green "
    }
  ]
}

I would like to retrieve every distinct colourname of every entry of this collection, and be able to filter it with a search.

I tried with distinct but without success. I searched further and found that an aggregation could help me. For the moment I have :

db.getCollection('myCollection').aggregate([
    { "$match": { "colours.colourname": /Gre/ } }, # Gre is my search
    { "$unwind": "$colours" },
    { "$match": { "colours.colourname": /search/ } },
    { "$group": {
       "_id": "$colours.colourname"
    }}
])

It is working, but I get an array like :

{
"result" : [ 
    {
        "_id" : "Grey"
    }, 
    {
        "_id" : "Light Green "
    }, 
    {
        "_id" : "Light Green"
    }, 
    {
        "_id" : "Green "
    }, 
    {
        "_id" : "Green"
    }
],
"ok" : 1.0000000000000000
}

And I would like to remove duplicate entries which have a space in the end and displays them like :

["Grey","Light Green","Green"]

Upvotes: 3

Views: 1819

Answers (1)

chridam
chridam

Reputation: 103365

One approach you could take is the Map-Reduce way even though the JavaScript interpreter driven mapReduce takes a bit longer than the aggregation framework but will work since you will be using some very useful native JavaScript functions that are lacking in the aggregation framework. For instance, in the map function you could use the trim() function to remove any trailing spaces in your colourname fields so that you can emit the "cleansed" keys.

The Map-Reduce operation would typically have the following map and reduce functions:

var map = function() {
    if (!this.colours) return;
    this.colours.forEach(function (c){
        emit(c.colourname.trim(), 1)
    });
};

var reduce = function(key, values) {
    var count = 0;
    for (index in values) {
        count += values[index];
    }
    return count;    
};

db.runCommand( { mapreduce : "myCollection", map : map , reduce : reduce , out : "map_reduce_result" } );

You can then query map_reduce_result collection with the regex to have the result:

var getDistinctKeys = function (doc) { return doc._id };
var result = db.map_reduce_result.find({ "_id": /Gre/ }).map(getDistinctKeys);
print(result); // prints ["Green", "Grey", "Light Green"]

-- UPDATE --

To implement this in Python, PyMongo's API supports all of the features of MongoDB’s map/reduce engine thus you could try the following:

import pymongo
import re
from bson.code import Code

client = pymongo.MongoClient("localhost", 27017)
db = client.test
map = Code("function () {"
            "   if (!this.colours) return;"
            "   this.colours.forEach(function (c){"
            "       emit(c.colourname.trim(), 1)"
            "   });"
            "};")

reduce = Code("function (key, values) {"
            "   var count = 0;"
            "       for (index in values) {"
            "           count += values[index];"
            "       }"
            "       return count;"    
            "   };")

result = db.myCollection.map_reduce(map, reduce, "map_reduce_result")
regx = re.compile("Gre", re.IGNORECASE)

for doc in result.find({"_id": regx}):
    print(doc)

Upvotes: 2

Related Questions