Louis parker
Louis parker

Reputation: 83

queries on users' comments in MongoDB

I'm new to MongoDB and I'm trying to write some queries for website comments.

I have a collection like this:

{   
"_id": "post1"
"comments" : [ 
    {
        "user" : "goodman",
        "text" : "hello",
        "replies" : [ 
            {
                "user" : "littleboy",
                "text" : "hi"
            }
        ]
    }, 
    {
        "user" : "richpapa",
        "text" : "haha"
    }, 
    {
        "user" : "goodman",
        "text" : "how is it going?!"
    }
]
 }
 {
   "_id": "post2"
   "comments" : [ 
     {
        "user" : "katze",
        "text" : "meow!",
        "replies" : [ 
            {
                "user" : "kitten",
                "text" : "miaw miaw!"
            }, 
            {
                "user" : "littleboy",
                "text" : "mioooooo"
            }, 
            {
                "user" : "goodman",
                "text" : "meme"
            }
        ]
    }, 
    {
        "user" : "kitten",
        "text" : "ich bin katze",
        "replies" : [ 
            {
                "user" : "sally",
                "text" : "ja, du bist katze maw"
            }
        ]
    }
 ]
 }

Each post can have several comments and each comment can also have several replies. Here are what I failed to do:

first, I wanna print the name and comments of users who have replied to their own comments. Second, I wanna print the names and also the number of users who have sent more than one message on the website (both comments and replies should be considered).

Here is my try for the first part:

db.website.aggregate([{$group : {_id : "$comments.user", 
repliers: {$addToSet : "$comments.replies.user"}}}])

If I'm not mistaken this will give the name of users who have sent comments and also the name of repliers as an array. But I could not use it to gain what I was looking for.

Upvotes: 3

Views: 698

Answers (2)

Buzz Moschetti
Buzz Moschetti

Reputation: 7578

An alternate approach to the "comments-on-comments" approach is to flatten the structure, consider every entry a "post", and use $graphLookup to bring it all together. The original post can carry extra information in the doc if necessary and the subsequent comments can carry less but the basic structure is the same. Consider the following entries; tabs are used to visually help make the hierarchy a little more obvious but watch the relationship between _id and ref:

var r = [
    // target                                                                                                               
    {"_id": "P1", from:"OP", text: "orig", ts:new ISODate("2020-01-01")}
      // level 0                                                                                                            
      ,{"_id": "P2", from:"A", ref:"P1", text: "not bad", ts:new ISODate("2020-01-01")}
      ,{"_id": "P3", from:"B", ref:"P1", text: "corn", ts:new ISODate("2020-01-01")}
      // level 1                                                                                                          
        ,{"_id": "P4", from:"C", ref:"P3", text: "maybe", ts:new ISODate("2020-01-02")}
          // level 2                                                                                                        
          ,{"_id": "P41", from:"D", ref:"P4", text: "hoo", ts:new ISODate("2020-01-08")}
            // level 3                                                                                                      
            ,{"_id": "P61", from:"A", ref:"P41", text: "concerned", ts:new ISODate("2020-01-02")}

        ,{"_id": "P42", from:"E", ref:"P3", text: "hello", ts:new ISODate("2020-01-02")}


    ,{"_id": "P5", from:"OP2", text: "orig2",         ts:new ISODate("2020-01-01")}
    ,{"_id": "P6", from:"C", ref:"P5", text: "another thing...", ts:new ISODate("2020-01-06")}
];

If we are looking for all dialogue starting with P1, we can do this:

db.foo.aggregate([
    {$match: {"_id": "P1"}},
    {$graphLookup: {
        from: "foo",
        connectToField: "ref",
        connectFromField: "_id",
        startWith: "$_id", // usually value of connectFromField                                                             
        depthField: "n",
        as: "zz"
    }}
]);

to yield this:

{
    "_id" : "P1",
    "from" : "OP",
    "text" : "orig",
    "ts" : ISODate("2020-01-01T00:00:00Z"),
    "zz" : [
        {
            "_id" : "P42",
            "from" : "E",
            "ref" : "P3",
            "text" : "hello",
            "ts" : ISODate("2020-01-02T00:00:00Z"),
            "n" : NumberLong(1)
        },
        {
            "_id" : "P4",
            "from" : "C",
            "ref" : "P3",
            "text" : "maybe",
            "ts" : ISODate("2020-01-02T00:00:00Z"),
            "n" : NumberLong(1)
        },
        {
            "_id" : "P2",
            "from" : "A",
            "ref" : "P1",
            "text" : "not bad",
            "ts" : ISODate("2020-01-01T00:00:00Z"),
            "n" : NumberLong(0)
        },
        {
            "_id" : "P41",
            "from" : "D",
            "ref" : "P4",
            "text" : "hoo",
            "ts" : ISODate("2020-01-08T00:00:00Z"),
            "n" : NumberLong(2)
        },
        {
            "_id" : "P61",
            "from" : "A",
            "ref" : "P41",
            "text" : "concerned",
            "ts" : ISODate("2020-01-02T00:00:00Z"),
            "n" : NumberLong(3)
        },
        {
            "_id" : "P3",
            "from" : "B",
            "ref" : "P1",
            "text" : "corn",
            "ts" : ISODate("2020-01-01T00:00:00Z"),
            "n" : NumberLong(0)
        }
    ]
}

There are a few interesting advantages to this design:

  1. No contained recursion to worry about. The query is the same and very performant for essentially any practical depth of comment-reply dialogue.
  2. Secondary lookups on other things like dates, participants, likes/votes, etc. across multiple post threads are much easier because the structure is flat. Put an index on from for example and you can rapidly get all input from a particular actor. Finding the most recent comment in the hierarchy or the newest comments since you last looked (state somewhere of course) becomes trivial. Scanning for inappropriate language is similarly trivial because there is a single text field in the same position in every document.
  3. Because $graphLookup supplies the depthField (although acknowledged NOT in numeric order...) it becomes easy to trim the response tree. For example to get only 2 of the first level (level 0) replies, you could add this stage after $graphLookup:
    ,{$addFields: {zz: {$slice: [ {$filter: {
                                    input: "$zz",
                                    cond:  {$eq:[0,"$$this.n"]}
                                  }}, 0, 2] }
    }}

Note that the hierarchy and dates are independent. To find the latest reply to a comment in a thread, we could use the following stage. Note that 2020-01-08 is the latest entry on doc P41:

,{$addFields: {zz: {$reduce: {
        input: "$zz",
        initialValue: {ts:new ISODate("2000-01-01")}, // init to a REALLY old date                                    
            in: {$cond: [
                {$gt:["$$this.ts","$$value.ts"]}, // if higher than prev
                "$$this",  // then use this one
                "$$value"  // else keep highest seen so far
                ]}
        }}
    }}

{
    "_id" : "P1",
    "from" : "OP",
    "text" : "orig",
    "ts" : ISODate("2020-01-01T00:00:00Z"),
    "zz" : {
        "_id" : "P41",
        "from" : "D",
        "ref" : "P4",
        "text" : "hoo",
        "ts" : ISODate("2020-01-08T00:00:00Z"),
        "n" : NumberLong(2)
    }
}

Lastly, sometimes you will want to work on the initial target doc as well as the chain of $graphLookup docs. To do so, simply $project a concatentation of the top-level doc info with the array resulting from $graphLookup:

    ,{$project: {zz: {$concatArrays: [
                       // Make array of 1.  Call n -1 to identify it as the
                       // initial target doc:
                       [{_id: "$_id",n:-1,from:"$from",text:"$text",ts:"$ts"}], // make array of 1
                        "$zz"
                       ]}
    }}

{
    "_id" : "P1",
    "zz" : [
        {
            "_id" : "P1",
            "n" : -1,
            "from" : "OP",
            "text" : "orig",
            "ts" : ISODate("2020-01-01T00:00:00Z")
        },
        {
            "_id" : "P2",
            "from" : "A",
            "ref" : "P1",
            "text" : "not bad",
            "ts" : ISODate("2020-01-01T00:00:00Z"),
            "n" : NumberLong(0)
        },
....

Upvotes: 0

R2D2
R2D2

Reputation: 10707

To the first part:

   db.collection.aggregate([
   {
     $unwind: "$comments"
   },
    {
     $project: {
     _id: 0,
     items: {
       $filter: {
        input: "$comments.replies",
        as: "item",
         cond: {
          $eq: [
          "$$item.user",
          "$comments.user"
        ]
       }
      }
     }
    }
    },
    {
       $unwind: "$items"
     },
      {
      $project: {
      "self_repling_user": "$items.user",
      "self_repling_comment": "$items.text"
      }
     }
   ])

playground

Explained:

  1. Convert the comments array to object.
  2. Filter only replies where comment.user is the repling user.
  3. unwind the array with comments with own replies
  4. show only the self repling user and his comment

To the second part:

db.collection.aggregate([
   {
   $unwind: "$comments"
   },
   {
    $unwind: "$comments.replies"
   },
   {
    $group: {
    _id: "allusers",
    "user2": {
      $push: "$comments.user"
       },
      user1: {
        $push: "$comments.replies.user"
       }
      }
   },
    {
     $project: {
         users: {
           $concatArrays: [
             "$user1",
            "$user2"
          ]
        }
      }
    },
    {
     $unwind: "$users"
    },
    {
      $group: {
       _id: "$users",
      count: {
       $sum: 1
       }
      }
    },
    {
     $match: {
     count: {
       $gt: 1
     }
    }
  }
  ])

playground

explained:

  1. unwind the comments & comments.recplies arrays
  2. group to form two arrays only with the users
  3. join the two arrays in single one
  4. unwind the total user array
  5. Group by user to count number of comments or replies
  6. Filter only those that commentor reply more then once

Upvotes: 1

Related Questions