Reputation: 3051
Consider the following example:
db.article.aggregate(
{ $group : {
_id : "$author",
docsPerAuthor : { $sum : 1 },
viewsPerAuthor : { $sum : "$pageViews" }
}}
);
This groups by the author field and computes two fields.
I have values for $author = FirstName_LastName. Now instead of grouping by $author, I want to group by all authors who share the same LastName.
I tried $regex to group by all matching strings after the '_'
$author.match(/_[a-zA-Z0-9]+$/)
db.article.aggregate(
{ $group : {
_id : "$author".match(/_[a-zA-Z0-9]+$/),
docsPerAuthor : { $sum : 1 },
viewsPerAuthor : { $sum : "$pageViews" }
}}
);
also tried the following:
db.article.aggregate(
{ $group : {
_id : {$author: {$regex: /_[a-zA-Z0-9]+$/}},
docsPerAuthor : { $sum : 1 },
viewsPerAuthor : { $sum : "$pageViews" }
}}
);
Upvotes: 13
Views: 20885
Reputation: 2274
$group
combining $addFields
and $arrayElemAt
works for me (version ≥ 3.4).
Say we have following data in collection faculty
, database school
:
{ "_id" : ObjectId("5ed5a59b1febc4c796a88e80"), "name" : "Harry_Potter" }
{ "_id" : ObjectId("5ed5a60e1febc4c796a88e81"), "name" : "Edison_Potter" }
{ "_id" : ObjectId("5ed5a6231febc4c796a88e82"), "name" : "Jack_Potter" }
{ "_id" : ObjectId("5ed5a62f1febc4c796a88e83"), "name" : "Alice_Walker" }
{ "_id" : ObjectId("5ed5a65f1febc4c796a88e84"), "name" : "Bob_Walker" }
{ "_id" : ObjectId("5ed5a6731febc4c796a88e85"), "name" : "Will_Smith" }
Following can group each document by the last name:
db.faculty.aggregate([
{
$addFields: {
lastName: {
$arrayElemAt: [ { $split: ["$name", "_"] }, 1 ]
}
}
},
{
$group: {
_id: "$lastName",
count: {$sum: 1}
}
}
])
Running result is:
{ "_id" : "Potter", "count" : 3 }
{ "_id" : "Walker", "count" : 2 }
{ "_id" : "Smith", "count" : 1 }
The trick I used is to add a field named lastName
. Based on what you have for the name
field, it can be split into an array by _
. Last name is at index 1 and first name at index 0.
Reference
Upvotes: 4
Reputation: 4656
Use mapReduce: it is the general form of aggregation. This is how to proceed in mongo shell: Define the map function
var mapFunction = function() {
var key = this.author.match(/_[a-zA-Z0-9]+$/)[0];
var nb_match_bar2 = 0;
if( this.bar.match(/bar2/g) ){
nb_match_bar2 = 1;
}
var value = {
docsPerAuthor: 1,
viewsPerAuthor: Array.sum(this.pageViews)
};
emit( key, value );
};
and the reduce function
var reduceFunction = function(key, values) {
var reducedObject = {
_id: key,
docsPerAuthor: 0,
viewsPerAuthor: 0
};
values.forEach( function(value) {
reducedObject.docsPerAuthor += value.docsPerAuthor;
reducedObject.viewsPerAuthor += value.viewsPerAuthor;
}
);
return reducedObject;
};
run mapReduce and save the result in map_reduce_result
>db.st.mapReduce(mapFunction, reduceFunction, {out:'map_reduce_result'})
query map_reduce_result to have the result
>db.map_reduce_result.find()
Upvotes: 4
Reputation: 2217
A possible workaround with the aggregation framework consists in using $project to compute the author name. However, it is dirty as you need to manually loop through the different first name sizes:
Here, we compute the field name as the substring after the '_' character, trying each of its possible position (this is why there is a chain of $cond), and fallbacking in returning the whole $author if the first name is too long:
http://mongotry.herokuapp.com/#?bookmarkId=52fb5f24a0378802003b4c68
[
{
"$project": {
"author": 1,
"pageViews": 1,
"name": {
"$cond": [
{
"$eq": [
{
"$substr": [
"$author",
0,
1
]
},
"_"
]
},
{
"$substr": [
"$author",
1,
999
]
},
{
"$cond": [
{
"$eq": [
{
"$substr": [
"$author",
1,
1
]
},
"_"
]
},
{
"$substr": [
"$author",
2,
999
]
},
{
"$cond": [
{
"$eq": [
{
"$substr": [
"$author",
2,
1
]
},
"_"
]
},
{
"$substr": [
"$author",
3,
999
]
},
{
"$cond": [
{
"$eq": [
{
"$substr": [
"$author",
3,
1
]
},
"_"
]
},
{
"$substr": [
"$author",
4,
999
]
},
{
"$cond": [
{
"$eq": [
{
"$substr": [
"$author",
4,
1
]
},
"_"
]
},
{
"$substr": [
"$author",
5,
999
]
},
"$author"
]
}
]
}
]
}
]
}
]
}
}
},
{
"$group": {
"_id": "$name",
"viewsPerAuthor": {
"$sum": "$pageViews"
}
}
}
]
Upvotes: 3
Reputation: 3150
Actually there is no such method which provides this kind of functionality or i could not find the appropriate version which contains it. That will not work with $regexp i think : http://docs.mongodb.org/manual/reference/operator/regex/ it is just for pattern matching.
There is an improvement request in the jira : https://jira.mongodb.org/browse/SERVER-6773
It is in open unresolved state. BUT
in github i found this disscussion: https://github.com/mongodb/mongo/pull/336
And if you check this commit: https://github.com/nleite/mongo/commit/2dd175a5acda86aaad61f5eb9dab83ee19915709
it contains more or less exactly the method you likely to have. I do not really get the point of the state of this improvement: in 2.2.3 it is not working .
Upvotes: 6