Reputation: 195
I have a MongoDB collection which contains a colours array like :
myCollection :
{
_id : ...,
"colours" : [
{
"colourpercentage" : "42",
"colourname" : "Blue"
},
{
"colourpercentage" : "32",
"colourname" : "Red"
},
{
"colourpercentage" : "10",
"colourname" : "Green "
}
]
}
I would like to retrieve every distinct colourname of every entry of this collection, and be able to filter it with a search.
I tried with distinct but without success. I searched further and found that an aggregation could help me. For the moment I have :
db.getCollection('myCollection').aggregate([
{ "$match": { "colours.colourname": /Gre/ } }, # Gre is my search
{ "$unwind": "$colours" },
{ "$match": { "colours.colourname": /search/ } },
{ "$group": {
"_id": "$colours.colourname"
}}
])
It is working, but I get an array like :
{
"result" : [
{
"_id" : "Grey"
},
{
"_id" : "Light Green "
},
{
"_id" : "Light Green"
},
{
"_id" : "Green "
},
{
"_id" : "Green"
}
],
"ok" : 1.0000000000000000
}
And I would like to remove duplicate entries which have a space in the end and displays them like :
["Grey","Light Green","Green"]
Upvotes: 3
Views: 1819
Reputation: 103365
One approach you could take is the Map-Reduce way even though the JavaScript interpreter driven mapReduce takes a bit longer than the aggregation framework but will work since you will be using some very useful native JavaScript functions that are lacking in the aggregation framework. For instance, in the map function you could use the trim()
function to remove any trailing spaces in your colourname
fields so that you can emit the "cleansed" keys.
The Map-Reduce operation would typically have the following map and reduce functions:
var map = function() {
if (!this.colours) return;
this.colours.forEach(function (c){
emit(c.colourname.trim(), 1)
});
};
var reduce = function(key, values) {
var count = 0;
for (index in values) {
count += values[index];
}
return count;
};
db.runCommand( { mapreduce : "myCollection", map : map , reduce : reduce , out : "map_reduce_result" } );
You can then query map_reduce_result
collection with the regex to have the result:
var getDistinctKeys = function (doc) { return doc._id };
var result = db.map_reduce_result.find({ "_id": /Gre/ }).map(getDistinctKeys);
print(result); // prints ["Green", "Grey", "Light Green"]
-- UPDATE --
To implement this in Python, PyMongo's API supports all of the features of MongoDB’s map/reduce engine thus you could try the following:
import pymongo
import re
from bson.code import Code
client = pymongo.MongoClient("localhost", 27017)
db = client.test
map = Code("function () {"
" if (!this.colours) return;"
" this.colours.forEach(function (c){"
" emit(c.colourname.trim(), 1)"
" });"
"};")
reduce = Code("function (key, values) {"
" var count = 0;"
" for (index in values) {"
" count += values[index];"
" }"
" return count;"
" };")
result = db.myCollection.map_reduce(map, reduce, "map_reduce_result")
regx = re.compile("Gre", re.IGNORECASE)
for doc in result.find({"_id": regx}):
print(doc)
Upvotes: 2