Reputation: 5000
I have some documents
{name: 'apple', type: 'fruit', color: 'red'}
{name: 'banana', type: 'fruit', color: 'yellow'}
{name: 'orange', type: 'fruit', color: 'orange'}
{name: 'eggplant', type: 'vege', color: 'purple'}
{name: 'brocoli', type: 'vege', color: 'green'}
{name: 'rose', type: 'flower', color: 'red'}
{name: 'cauli', type: 'vege', color: 'white'}
{name: 'potato', type: 'vege', color: 'brown'}
{name: 'onion', type: 'vege', color: 'white'}
{name: 'strawberry', type: 'fruit', color: 'red'}
{name: 'cashew', type: 'nut', color: ''}
{name: 'almond', type: 'nut', color: ''}
{name: 'lemon', type: 'vege', color: 'yellow'}
{name: 'tomato', type: 'vege', color: 'red'}
{name: 'tomato', type: 'fruit', color: 'red'}
{name: 'fig', type: 'fruit', color: 'pink'}
{name: 'nectarin', type: 'fruit', color: 'pink'}
I want to group them into alphabets like below
{
_id:'a',
name:['apple','almond'],
type:[],
color:[]
}
{
_id:'b',
name:['banana','brocoli'],
type:[],
color:['brown']
}
...
{
_id:'f',
name:['fig'],
type:['fruit','flower'],
color:['']
}
...
{
_id:'n',
name:['nectarin'],
type:['nut'],
color:['']
}
...
{
_id:'p',
name:['potato'],
type:[''],
color:['pink','purple']
}
...
The result can be saved into another collection. So I can issue a query in the newly created collection: find({_id:'a'})
to return name, type and color begins with the letter 'a'.
I have thought about using $group
$group: {
_id: $substr: ['$name', 0, 1],
name: {$addToSet: '$name'},
}
Then another command
$group: {
_id: $substr: ['$type', 0, 1],
name: {$addToSet: '$type'},
}
And
$group: {
_id: $substr: ['$color', 0, 1],
name: {$addToSet: '$color'},
}
But I am stuck at how to unify all three together to save into a new collection. Or is aggregation framework not suitable for this kind of data summary?
In a real world example, e.g. a e-commerce site, the front page displays something like: "currently we have 135636
products under 231
categories from 111
brands". Surely, these numbers should be cached somewhere (in memory or in another collection), because running $group
each time is resource intensive? What would be the optimal schema/design for these situations?
Sorry, my questions are a bit 'confusing'.
Upvotes: 2
Views: 212
Reputation: 50406
Since you have multiple arrays here, the key thing is to "merge" them all into one for the simplest processing.
The $map
operator of the aggregation framework works well here, as well as transforming elements so that you get your "first letter" from each word within the data:
db.alpha.aggregate([
{ "$project": {
"list": {
"$map": {
"input": [ "A", "B", "C" ],
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "A" ] },
{
"type": { "$literal": "name" },
"value": "$name",
"alpha": { "$substr": [ "$name",0,1 ] }
},
{ "$cond": [
{ "$eq": [ "$$el", "B" ] },
{
"type": { "$literal": "type" },
"value": "$type",
"alpha": { "$substr": [ "$type",0,1 ] }
},
{
"type": { "$literal": "color" },
"value": "$color",
"alpha": { "$substr": [ "$color",0,1 ] }
}
]}
]
}
}
}
}},
{ "$unwind": "$list" },
{ "$match": { "list.alpha": { "$ne": "" } } },
{ "$group": {
"_id": "$list.alpha",
"list": {
"$addToSet": "$list"
}
}},
{ "$project": {
"name": {
"$setDifference": [
{ "$map": {
"input": "$list",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.type", "name" ] },
"$$el.value",
false
]
}
}},
[false]
]
},
"type": {
"$setDifference": [
{ "$map": {
"input": "$list",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.type", "type" ] },
"$$el.value",
false
]
}
}},
[false]
]
},
"color": {
"$setDifference": [
{ "$map": {
"input": "$list",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.type", "color" ] },
"$$el.value",
false
]
}
}},
[false]
]
}
}},
{ "$sort": { "_id": 1 } }
])
If you look at the data in "stages" it makes a lot of sense what is happening here in the tranformation.
The first stage "maps" all of the fields into a single array per document, so all documents now look like this:
{
"_id" : ObjectId("55df0652c9064ef625d7f36e"),
"list" : [
{
"type" : "name",
"value" : "nectarin",
"alpha" : "n"
},
{
"type" : "type",
"value" : "fruit",
"alpha" : "f"
},
{
"type" : "color",
"value" : "pink",
"alpha" : "p"
}
]
}
The $unwind
is of little consequence, as it does the standard and creates new documents from each member. It is the $group
that does most of the work here with this result per "alpha" in the grouping:
{
"_id" : "o",
"list" : [
{
"type" : "name",
"value" : "orange",
"alpha" : "o"
},
{
"type" : "color",
"value" : "orange",
"alpha" : "o"
},
{
"type" : "name",
"value" : "onion",
"alpha" : "o"
}
]
}
That has a nice grouping, and is arguably a decent output format. But in order to get to the end results then the $map
operator is employed again alongside $setDifference
which can be used to remove the false
values where each field "type" conversion does not match the required output field.
The full result is:
{ "_id" : "a", "name" : [ "almond", "apple" ], "type" : [ ], "color" : [ ] }
{ "_id" : "b", "name" : [ "brocoli", "banana" ], "type" : [ ], "color" : [ "brown" ] }
{ "_id" : "c", "name" : [ "cashew", "cauli" ], "type" : [ ], "color" : [ ] }
{ "_id" : "e", "name" : [ "eggplant" ], "type" : [ ], "color" : [ ] }
{ "_id" : "f", "name" : [ "fig" ], "type" : [ "flower", "fruit" ], "color" : [ ] }
{ "_id" : "g", "name" : [ ], "type" : [ ], "color" : [ "green" ] }
{ "_id" : "l", "name" : [ "lemon" ], "type" : [ ], "color" : [ ] }
{ "_id" : "n", "name" : [ "nectarin" ], "type" : [ "nut" ], "color" : [ ] }
{ "_id" : "o", "name" : [ "onion", "orange" ], "type" : [ ], "color" : [ "orange" ] }
{ "_id" : "p", "name" : [ "potato" ], "type" : [ ], "color" : [ "pink", "purple" ] }
{ "_id" : "r", "name" : [ "rose" ], "type" : [ ], "color" : [ "red" ] }
{ "_id" : "s", "name" : [ "strawberry" ], "type" : [ ], "color" : [ ] }
{ "_id" : "t", "name" : [ "tomato" ], "type" : [ ], "color" : [ ] }
{ "_id" : "v", "name" : [ ], "type" : [ "vege" ], "color" : [ ] }
{ "_id" : "w", "name" : [ ], "type" : [ ], "color" : [ "white" ] }
{ "_id" : "y", "name" : [ ], "type" : [ ], "color" : [ "yellow" ] }
Where everything is grouped alphabetically and with their own arrays for each field.
Upcoming releases of MongoDB will have a $filter
that makes the $map
and $setDifference
combination a bit nicer. But that does not make "sets", not that it matters much to this process as long as $addToSet
is employed where it is.
Thinking about this, I would like to "advise" that considering the amount of data you want to process here that the resulting "arrays" for each letter might just possibly exceed the BSON limits depending on how many distinct "words" there actually are.
In which case the "advice" here would be follow the process right up to and including the $match
, but then only $group
afterwards like this:
{ "$group": {
"_id": {
"alpha": "$list.alpha",
"type": "$list.type",
"value": "$list.value",
}
}},
{ "$sort": { "_id": 1 } }
It's longer output of course, but will not exceed the BSON limit for documents at any stage.
Upvotes: 2
Reputation: 7840
Using aggregation you should use some complex aggregation query. First find out all name
first letters using substr
after that create all name,type and color
array using group
use $map to check whether given name starts with or not
$setDifference used to remove duplicate empty parameter and finally $out used for writing documents in new collection.
Check this aggregation query :
db.collection.aggregate({
"$project": {
"firstName": {
"$substr": ["$name", 0, 1]
},
"name": 1,
"type": 1,
"color": 1
}
}, {
"$group": {
"_id": null,
"allName": {
"$push": "$name"
},
"allType": {
"$push": "$type"
},
"allColor": {
"$push": "$color"
},
"allfirstName": {
"$push": "$firstName"
}
}
}, {
"$unwind": "$allfirstName"
}, {
"$group": {
"_id": "$allfirstName",
"allType": {
"$first": "$allType"
},
"allName": {
"$first": "$allName"
},
"allColor": {
"$first": "$allColor"
}
}
}, {
"$project": {
"type": {
"$setDifference": [{
"$map": {
"input": "$allType",
"as": "type",
"in": {
"$cond": {
"if": {
"$eq": [{
"$substr": ["$$type", 0, 1]
}, "$_id"]
},
"then": "$$type",
"else": ""
}
}
}
},
[""]
]
},
"color": {
"$setDifference": [{
"$map": {
"input": "$allColor",
"as": "color",
"in": {
"$cond": {
"if": {
"$eq": [{
"$substr": ["$$color", 0, 1]
}, "$_id"]
},
"then": "$$color",
"else": ""
}
}
}
},
[""]
]
},
"name": {
"$setDifference": [{
"$map": {
"input": "$allName",
"as": "name",
"in": {
"$cond": {
"if": {
"$eq": [{
"$substr": ["$$name", 0, 1]
}, "$_id"]
},
"then": "$$name",
"else": ""
}
}
}
},
[""]
]
}
}
}, {
"$sort": {
"_id": 1
}
}, {
"$out": "newCollection"
})
Upvotes: 0