user3266259
user3266259

Reputation: 399

How to compare documents in the same collection?

I am a novice when it comes to mongo as I have traditionally only worked with Oracle database. I have a mongo database that's storing bitbucket data in columns like so:

_id | _class | collectorItemId| firstEverCommit | scmUrl | scmBranch | scmAuthor | scmCommitTimestamp

There are a few more columns in there that I've omitted for the sake of time. For the scmBranch column, the column is populated with one of two strings: "master" or "develop". Here is a sample of what the data looks like: enter image description here

Here is the document view of one of the rows:

{
"_id" : ObjectId("5e39d6a0330c130006a042c6"),
"collectorItemId" : ObjectId("5e33a6b9887ef5000620a0c0"),
"firstEverCommit" : false,
"scmUrl" : "sampleRepo1",
"scmBranch" : "master",
"scmRevisionNumber" : "a2ad6842468eb55bffcbe7d700b6addd3eb11629",
"scmAuthor" : "son123",
"scmCommitTimestamp" : NumberLong(1580841662000)
}

I am now trying to formulate mongo queries that will get me the following data:

 1. For each scmUrl, If max(scmCommitTimestamp) where scmBranch =
    "develop" > max(scmCommitTimestamp) where scmBranch = "master" THEN
    count the number of rows (i.e commits) where scmBranch = "develop"
    AND scmCommitTimestamp > max(scmCommitTimestamp) where scmBranch =
    "master"

 2. For the results found in #1, find the oldest commit and newest
    commit

So far, the best mongo query I've been able to come up with is the following:

db.bitbucket.aggregate([{
    "$group": {
        "_id": {
            "scmUrl": "$scmUrl",
            "scmBranch": "$scmBranch"
        },
        "MostRecentCommit": {
            "$max": {"$toDate":"$scmCommitTimestamp"}
        }
    }
},{
    "$project": {
        "RepoName": {"$substr": ["$_id.scmUrl",39,-1]},
        "Branch": "$_id.scmBranch",
        "MostRecentCommit": "$MostRecentCommit"
    }
},{
   "$sort":{
       "RepoName":1,
       "Branch":1
       }

}
])

But this only gets me back the most recent commit for the develop branch and the master branch of each scmUrl (i.e repo), like so: enter image description here

Ideally, I'd like to get back a table of results with the following columns:

scmUrl/RepoName | Number of commits on develop branch that are not on master branch| oldest commit in develop branch that's not in master branch | newest commit in develop branch that's not in master branch

How can I modify my mongo query to extract the data that I want?

Upvotes: 2

Views: 1944

Answers (1)

s7vr
s7vr

Reputation: 75964

You could try something like this.

Below query will grab the latest commit date from master for each repo. After you have the latest commit date you will join back to the same collection to pull all commits where branch is develop and has newer commits than master branch for each repo.

db.bitbucket.aggregate([
  {"$match":{"scmBranch":"master"}},
  {"$group":{"_id":"$scmUrl","recentcommitdate":{"$max":"$scmCommitTimestamp"}}},
  {"$lookup":{
   "from":"bitbucket",
    "let":{"scmUrl":"$_id","recentcommitdate":"$recentcommitdate"},
    "pipeline":[
      {"$match":{"$expr":
        {"$and":[
          {"$eq":["$scmBranch","develop"]},
          {"$eq":["$scmUrl","$$scmUrl"]},
          {"$gte":["$scmCommitTimestamp", "$$recentcommitdate"]}
        ]}
      }},
      {"$sort":{"scmCommitTimestamp":-1}}
    ],
  "as":"commits"
  }},
  {"$match":{"commits.0":{"$exists":true}}},
  {"$project":{
     "commits":{"$size":"$commits"},
     "lastcommit":{"$arrayElemAt":["$commits",0]},
     "firstcommit":{"$arrayElemAt":["$commits",-1]}
  }}
])

Sample added here https://mongoplayground.net/p/wLnFY0H_nJz

Update for revision ids

db.bitbucket.aggregate([
  {"$match":{"scmBranch":"master"}},
  {"$group":{"_id":"$scmUrl","revisionIds":{"$push":"$scmRevisionNumber"}}},
  {"$lookup":{
   "from":"bitbucket",
    "let":{"scmUrl":"$_id","revisionIds":"$revisionIds"},
    "pipeline":[
      {"$match":{"$expr":
        {"$and":[
          {"$eq":["$scmBranch","develop"]},
          {"$eq":["$scmUrl","$$scmUrl"]},
          {"$not":[{"$in":["$scmRevisionNumber","$$revisionIds"]}]}
        ]}
      }},
      {"$sort":{"scmCommitTimestamp":-1}}
    ],
  "as":"commits"
  }},
  {"$match":{"commits.0":{"$exists":true}}},
  {"$project":{
     "commits":{"$size":"$commits"},
     "lastcommit":{"$arrayElemAt":["$commits",0]},
     "firstcommit":{"$arrayElemAt":["$commits",-1]}
  }}
])

Upvotes: 1

Related Questions