Reputation: 4639
I am working on an R package that is supposed to parse git repository history on a granular level. When validating some of the parsing results with an open source project from GitHub, I encountered something unexpected. In previous validation efforts of my package, I managed reconstruct the number of lines of each file in several git repositories correctly via the output of git log, so I thought using git log
to obtain comprehensive information about changed files is a valid path to follow.
However, I found a commit in the aforementioned project where git log does not seem to convey all information about changed files: The commit with the hash 184f6c71dee03c66c7adaacb024b70d99075ea75
. When resetting HEAD to this commit and running both git log --stat
and git show --log
, I get this:
$ git log --stat
commit 184f6c71dee03c66c7adaacb024b70d99075ea75
Merge: 32e47a3 d203300
Author: ***
Date: Wed Nov 12 10:39:51 2014 +0100
merge changes from master branch
commit d203300bbe45981dab15b49c3c08deb31ad46466
Merge: 4b63f4e c8ae895
Author: ***
Date: Wed Nov 12 10:35:36 2014 +0100
[ output truncated ]
commit 32e47a32f3cc60b5705e9df93cdc6b730fae380b
Author: ***
Date: Tue Nov 11 18:00:55 2014 +0100
Added the internal class template `MatrixColumnVisitor` to represent
`VectorVisitor` concept for a column that is a `matrix`. Part of #602
NEWS.md | 3 +++
inst/include/dplyr.h | 1 +
inst/include/dplyr/MatrixColumnVisitor.h | 167 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
inst/include/dplyr/VectorVisitorImpl.h | 4 +++-
inst/include/dplyr/visitor.h | 17 ++++++++++++++---
inst/include/dplyr/white_list.h | 4 ++++
tests/testthat/test-filter.r | 12 ++++++++++++
7 files changed, 204 insertions(+), 4 deletions(-)
[ output truncated ]
And
$ git show --stat
commit 184f6c71dee03c66c7adaacb024b70d99075ea75
Merge: 32e47a3 d203300
Author: ***
Date: Wed Nov 12 10:39:51 2014 +0100
merge changes from master branch
NEWS.md | 4 ++--
R/RcppExports.R | 4 ++++
R/src-sql.r | 2 +-
inst/include/dplyr/NamedListAccumulator.h | 12 ++++++------
inst/include/dplyr/Result/LazyGroupedSubsets.h | 4 ++--
inst/include/dplyr/Result/LazySubsets.h | 4 ++--
inst/include/dplyr/Result/Name.h | 46 ++++++++++++++++++++++++++++++++++++++++++++++
inst/include/dplyr/Result/all.h | 1 +
src/RcppExports.cpp | 15 +++++++++++++++
src/dplyr.cpp | 29 ++++++++---------------------
src/strings_addresses.cpp | 19 +++++++++++++++++++
tests/testthat/test-joins.r | 7 +++++++
tests/testthat/test-mutate.r | 18 ++++++++++++++++++
13 files changed, 131 insertions(+), 34 deletions(-)
This was surprising because I thought
git log --stat
And
git show --stat
give me the same information.
This is not the case, since from the git log
output, I conclude that there were no files changed in the commit of interest.
When viewing the commit on GitHub or in the RStudio git tab, I can see that this commit was not empty, i.e. the information showed with git show
seems correct and it appears to me that there is information missing with git log
for that commit.
Any idea why there is this discrepancy? As pointed out, for a large amount of commits, I can correctly reproduce the number of lines of each file in a git repository from git log
, but not for this one. I am runnig git 2.9.2 on macOS. Thanks in advance.
Upvotes: 1
Views: 610
Reputation: 490108
By default, git log
skips showing diffs for merge commits, while git show
shows combined diffs for merge commits. Adding --cc
(show combined diffs) to the git log
options tells git log
to show combined diffs (or stats for them) for merges.
Note that combine diffs have limited usefulness. For proper analysis you may want -m
, which is an option that both git log
and git show
accept. It tells the commands to, in effect, split each merge into multiple virtual commits. A merge commit has n parents where n ≥ 2, and -m
makes Git turn commit A with parents P1, P2, ..., Pn into commit A-P1 with parent P1; commit A-P2 with parent P2; ...; commit A-Pn with parent Pn, then show (or --stat
) each of those commits individually.
Upvotes: 2