Reputation: 399
I am a novice when it comes to mongo as I have traditionally only worked with Oracle database. I have a mongo database that's storing bitbucket data in columns like so:
_id | _class | collectorItemId| firstEverCommit | scmUrl | scmBranch | scmAuthor | scmCommitTimestamp
There are a few more columns in there that I've omitted for the sake of time. For the scmBranch column, the column is populated with one of two strings: "master" or "develop".
Here is a sample of what the data looks like:
Here is the document view of one of the rows:
{
"_id" : ObjectId("5e39d6a0330c130006a042c6"),
"collectorItemId" : ObjectId("5e33a6b9887ef5000620a0c0"),
"firstEverCommit" : false,
"scmUrl" : "sampleRepo1",
"scmBranch" : "master",
"scmRevisionNumber" : "a2ad6842468eb55bffcbe7d700b6addd3eb11629",
"scmAuthor" : "son123",
"scmCommitTimestamp" : NumberLong(1580841662000)
}
I am now trying to formulate mongo queries that will get me the following data:
1. For each scmUrl, If max(scmCommitTimestamp) where scmBranch =
"develop" > max(scmCommitTimestamp) where scmBranch = "master" THEN
count the number of rows (i.e commits) where scmBranch = "develop"
AND scmCommitTimestamp > max(scmCommitTimestamp) where scmBranch =
"master"
2. For the results found in #1, find the oldest commit and newest
commit
So far, the best mongo query I've been able to come up with is the following:
db.bitbucket.aggregate([{
"$group": {
"_id": {
"scmUrl": "$scmUrl",
"scmBranch": "$scmBranch"
},
"MostRecentCommit": {
"$max": {"$toDate":"$scmCommitTimestamp"}
}
}
},{
"$project": {
"RepoName": {"$substr": ["$_id.scmUrl",39,-1]},
"Branch": "$_id.scmBranch",
"MostRecentCommit": "$MostRecentCommit"
}
},{
"$sort":{
"RepoName":1,
"Branch":1
}
}
])
But this only gets me back the most recent commit for the develop branch and the master branch of each scmUrl (i.e repo), like so:
Ideally, I'd like to get back a table of results with the following columns:
scmUrl/RepoName | Number of commits on develop branch that are not on master branch| oldest commit in develop branch that's not in master branch | newest commit in develop branch that's not in master branch
How can I modify my mongo query to extract the data that I want?
Upvotes: 2
Views: 1944
Reputation: 75964
You could try something like this.
Below query will grab the latest commit date from master for each repo. After you have the latest commit date you will join back to the same collection to pull all commits where branch is develop and has newer commits than master branch for each repo.
db.bitbucket.aggregate([
{"$match":{"scmBranch":"master"}},
{"$group":{"_id":"$scmUrl","recentcommitdate":{"$max":"$scmCommitTimestamp"}}},
{"$lookup":{
"from":"bitbucket",
"let":{"scmUrl":"$_id","recentcommitdate":"$recentcommitdate"},
"pipeline":[
{"$match":{"$expr":
{"$and":[
{"$eq":["$scmBranch","develop"]},
{"$eq":["$scmUrl","$$scmUrl"]},
{"$gte":["$scmCommitTimestamp", "$$recentcommitdate"]}
]}
}},
{"$sort":{"scmCommitTimestamp":-1}}
],
"as":"commits"
}},
{"$match":{"commits.0":{"$exists":true}}},
{"$project":{
"commits":{"$size":"$commits"},
"lastcommit":{"$arrayElemAt":["$commits",0]},
"firstcommit":{"$arrayElemAt":["$commits",-1]}
}}
])
Sample added here https://mongoplayground.net/p/wLnFY0H_nJz
Update for revision ids
db.bitbucket.aggregate([
{"$match":{"scmBranch":"master"}},
{"$group":{"_id":"$scmUrl","revisionIds":{"$push":"$scmRevisionNumber"}}},
{"$lookup":{
"from":"bitbucket",
"let":{"scmUrl":"$_id","revisionIds":"$revisionIds"},
"pipeline":[
{"$match":{"$expr":
{"$and":[
{"$eq":["$scmBranch","develop"]},
{"$eq":["$scmUrl","$$scmUrl"]},
{"$not":[{"$in":["$scmRevisionNumber","$$revisionIds"]}]}
]}
}},
{"$sort":{"scmCommitTimestamp":-1}}
],
"as":"commits"
}},
{"$match":{"commits.0":{"$exists":true}}},
{"$project":{
"commits":{"$size":"$commits"},
"lastcommit":{"$arrayElemAt":["$commits",0]},
"firstcommit":{"$arrayElemAt":["$commits",-1]}
}}
])
Upvotes: 1