Reputation: 83
I'm new to MongoDB and I'm trying to write some queries for website comments.
I have a collection like this:
{
"_id": "post1"
"comments" : [
{
"user" : "goodman",
"text" : "hello",
"replies" : [
{
"user" : "littleboy",
"text" : "hi"
}
]
},
{
"user" : "richpapa",
"text" : "haha"
},
{
"user" : "goodman",
"text" : "how is it going?!"
}
]
}
{
"_id": "post2"
"comments" : [
{
"user" : "katze",
"text" : "meow!",
"replies" : [
{
"user" : "kitten",
"text" : "miaw miaw!"
},
{
"user" : "littleboy",
"text" : "mioooooo"
},
{
"user" : "goodman",
"text" : "meme"
}
]
},
{
"user" : "kitten",
"text" : "ich bin katze",
"replies" : [
{
"user" : "sally",
"text" : "ja, du bist katze maw"
}
]
}
]
}
Each post can have several comments and each comment can also have several replies. Here are what I failed to do:
first, I wanna print the name and comments of users who have replied to their own comments. Second, I wanna print the names and also the number of users who have sent more than one message on the website (both comments and replies should be considered).
Here is my try for the first part:
db.website.aggregate([{$group : {_id : "$comments.user",
repliers: {$addToSet : "$comments.replies.user"}}}])
If I'm not mistaken this will give the name of users who have sent comments and also the name of repliers as an array. But I could not use it to gain what I was looking for.
Upvotes: 3
Views: 698
Reputation: 7578
An alternate approach to the "comments-on-comments" approach is to flatten the structure, consider every entry a "post", and use $graphLookup
to bring it all together. The original post can carry extra information in the doc if necessary and the subsequent comments can carry less but the basic structure is the same. Consider the following entries; tabs are used to visually help make the hierarchy a little more obvious but watch the relationship between _id
and ref
:
var r = [
// target
{"_id": "P1", from:"OP", text: "orig", ts:new ISODate("2020-01-01")}
// level 0
,{"_id": "P2", from:"A", ref:"P1", text: "not bad", ts:new ISODate("2020-01-01")}
,{"_id": "P3", from:"B", ref:"P1", text: "corn", ts:new ISODate("2020-01-01")}
// level 1
,{"_id": "P4", from:"C", ref:"P3", text: "maybe", ts:new ISODate("2020-01-02")}
// level 2
,{"_id": "P41", from:"D", ref:"P4", text: "hoo", ts:new ISODate("2020-01-08")}
// level 3
,{"_id": "P61", from:"A", ref:"P41", text: "concerned", ts:new ISODate("2020-01-02")}
,{"_id": "P42", from:"E", ref:"P3", text: "hello", ts:new ISODate("2020-01-02")}
,{"_id": "P5", from:"OP2", text: "orig2", ts:new ISODate("2020-01-01")}
,{"_id": "P6", from:"C", ref:"P5", text: "another thing...", ts:new ISODate("2020-01-06")}
];
If we are looking for all dialogue starting with P1
, we can do this:
db.foo.aggregate([
{$match: {"_id": "P1"}},
{$graphLookup: {
from: "foo",
connectToField: "ref",
connectFromField: "_id",
startWith: "$_id", // usually value of connectFromField
depthField: "n",
as: "zz"
}}
]);
to yield this:
{
"_id" : "P1",
"from" : "OP",
"text" : "orig",
"ts" : ISODate("2020-01-01T00:00:00Z"),
"zz" : [
{
"_id" : "P42",
"from" : "E",
"ref" : "P3",
"text" : "hello",
"ts" : ISODate("2020-01-02T00:00:00Z"),
"n" : NumberLong(1)
},
{
"_id" : "P4",
"from" : "C",
"ref" : "P3",
"text" : "maybe",
"ts" : ISODate("2020-01-02T00:00:00Z"),
"n" : NumberLong(1)
},
{
"_id" : "P2",
"from" : "A",
"ref" : "P1",
"text" : "not bad",
"ts" : ISODate("2020-01-01T00:00:00Z"),
"n" : NumberLong(0)
},
{
"_id" : "P41",
"from" : "D",
"ref" : "P4",
"text" : "hoo",
"ts" : ISODate("2020-01-08T00:00:00Z"),
"n" : NumberLong(2)
},
{
"_id" : "P61",
"from" : "A",
"ref" : "P41",
"text" : "concerned",
"ts" : ISODate("2020-01-02T00:00:00Z"),
"n" : NumberLong(3)
},
{
"_id" : "P3",
"from" : "B",
"ref" : "P1",
"text" : "corn",
"ts" : ISODate("2020-01-01T00:00:00Z"),
"n" : NumberLong(0)
}
]
}
There are a few interesting advantages to this design:
from
for example and you can rapidly get all input from a particular actor. Finding the most recent comment in the hierarchy or the newest comments since you last looked (state somewhere of course) becomes trivial. Scanning for inappropriate language is similarly trivial because there is a single text
field in the same position in every document.$graphLookup
supplies the depthField
(although acknowledged NOT in numeric order...) it becomes easy to trim the response tree. For example to get only 2 of the first level (level 0) replies, you could add this stage after $graphLookup
: ,{$addFields: {zz: {$slice: [ {$filter: {
input: "$zz",
cond: {$eq:[0,"$$this.n"]}
}}, 0, 2] }
}}
Note that the hierarchy and dates are independent. To find the latest reply to a comment in a thread, we could use the following stage. Note that 2020-01-08
is the latest entry on doc P41
:
,{$addFields: {zz: {$reduce: {
input: "$zz",
initialValue: {ts:new ISODate("2000-01-01")}, // init to a REALLY old date
in: {$cond: [
{$gt:["$$this.ts","$$value.ts"]}, // if higher than prev
"$$this", // then use this one
"$$value" // else keep highest seen so far
]}
}}
}}
{
"_id" : "P1",
"from" : "OP",
"text" : "orig",
"ts" : ISODate("2020-01-01T00:00:00Z"),
"zz" : {
"_id" : "P41",
"from" : "D",
"ref" : "P4",
"text" : "hoo",
"ts" : ISODate("2020-01-08T00:00:00Z"),
"n" : NumberLong(2)
}
}
Lastly, sometimes you will want to work on the initial target doc as well as the chain of $graphLookup
docs. To do so, simply $project
a concatentation of the top-level doc info with the array resulting from $graphLookup
:
,{$project: {zz: {$concatArrays: [
// Make array of 1. Call n -1 to identify it as the
// initial target doc:
[{_id: "$_id",n:-1,from:"$from",text:"$text",ts:"$ts"}], // make array of 1
"$zz"
]}
}}
{
"_id" : "P1",
"zz" : [
{
"_id" : "P1",
"n" : -1,
"from" : "OP",
"text" : "orig",
"ts" : ISODate("2020-01-01T00:00:00Z")
},
{
"_id" : "P2",
"from" : "A",
"ref" : "P1",
"text" : "not bad",
"ts" : ISODate("2020-01-01T00:00:00Z"),
"n" : NumberLong(0)
},
....
Upvotes: 0
Reputation: 10707
To the first part:
db.collection.aggregate([
{
$unwind: "$comments"
},
{
$project: {
_id: 0,
items: {
$filter: {
input: "$comments.replies",
as: "item",
cond: {
$eq: [
"$$item.user",
"$comments.user"
]
}
}
}
}
},
{
$unwind: "$items"
},
{
$project: {
"self_repling_user": "$items.user",
"self_repling_comment": "$items.text"
}
}
])
Explained:
To the second part:
db.collection.aggregate([
{
$unwind: "$comments"
},
{
$unwind: "$comments.replies"
},
{
$group: {
_id: "allusers",
"user2": {
$push: "$comments.user"
},
user1: {
$push: "$comments.replies.user"
}
}
},
{
$project: {
users: {
$concatArrays: [
"$user1",
"$user2"
]
}
}
},
{
$unwind: "$users"
},
{
$group: {
_id: "$users",
count: {
$sum: 1
}
}
},
{
$match: {
count: {
$gt: 1
}
}
}
])
explained:
Upvotes: 1