Reputation: 24375
I have a large collection called Messages
, like so:
{
user: 94fg844f,
event: null,
group: null,
name: "Jake",
text: "Hello world"
}, {
user: fje93jrg4,
event: null,
group: null,
name: "Bob"
text: "Testing"
}, {
user: fje93jrg4,
event: null,
group: null,
name: "Bob"
text: "Text here"
}, {
user: null,
event: d0j3n9fn3,
group: null,
name: "My Event"
text: "Testing 2"
}, {
user: null,
event: d0j3n9fn3,
group: null,
name: "My Event"
text: "Another text"
}
I need to get the first occurrence of the users, events and groups.
For example, since user fje93jrg4
occurs twice, I just want to get the document back with the text of Testing
since the one with the text of Text here
is older than it. Same goes with the event
of d0j3n9fn3
. It occurs twice, although I just want to get the first document back of it with the text of Testing 2
.
I looked into distinct
although it seems to only support one search term, like user
instead of user
, event
and group
.
The final result of the above would be:
{
user: 94fg844f,
event: null,
group: null,
name: "Jake",
text: "Hello world"
}, {
user: fje93jrg4,
event: null,
group: null,
name: "Bob"
text: "Testing"
}, , {
user: null,
event: d0j3n9fn3,
group: null,
name: "My Event"
text: "Testing 2"
}
My guess is that I'll probably have to use an aggregate
with $first
or something along those lines. The problem with doing 3 different queries is that I need to apply a limit so that I always get 10 results back. For example, there could be no recent group
s in the mix, just event
s and user
s.
Upvotes: 2
Views: 2935
Reputation: 61225
We can use the aggregation framework to do this. First we need to $sort
by user
and "_id". From there, we then $group
by "user" and use the $last
accumulator operator to return the last document for each user. Note that we can also use the $first
accumulator operator if we sort our documents in descending order, but sorting in ascending order and using $last
make our intention clear.
db.collection.aggregate([
{ "$sort": { "user": 1, "_id": -1 } },
{ "$group": {
"_id": "$user",
"user": { "$last": "$$ROOT" }
}}
])
which produces:
{
"_id" : "fje93jrg4",
"user" : {
"_id" : 2,
"user" : "fje93jrg4",
"event" : null,
"group" : null,
"name" : "Bob",
"text" : "Testing"
}
}
{
"_id" : "94fg844f",
"user" : {
"_id" : 1,
"user" : "94fg844f",
"event" : null,
"group" : null,
"name" : "Jake",
"text" : "Hello world"
}
}
{
"_id" : null,
"user" : {
"_id" : 4,
"user" : null,
"event" : "d0j3n9fn3",
"group" : null,
"name" : "My Event",
"text" : "Testing 2"
}
}
We may want to add a $project
to our pipeline but doing so will cause a drop of performance. However it will reduce both the amount of data sent over the wire and the time and memory used to decode documents on the client side if do not need all of the key/value pairs in a document returned.
The $project
stage look like this:
{ "$project": {
"_id": "$user._id",
"user": "$user.user",
"event": "$user.event",
"group": "$user.group",
"name": "$user.name",
"text": "$user.text"
}}
Upvotes: 3
Reputation: 615
Works Perfect:
db.getCollection('Mytest').aggregate([{ $match : { user: "fje93jrg4" }},{$limit:1}])
Upvotes: -2