iphipps
iphipps

Reputation: 555

Obtain historical statistics of contributions in a Git repository (of lines still in use)

Developers make commits, merge branches, overwrite each other's code. They come in and refactor or add new features. Code is iterative.

Is there a good git mechanism for finding which developer has the most utilized code in the current branch?

For example:

  1. Developer X creates 100 lines in a file.
  2. Developers Y and Z each refactor 50 lines of code.

git blame for the 100 line file will show developers Y and Z as the authors of 50 lines of code each. According to the same blame, developer X is responsible for 0 lines of code.

However, we know that X wrote the original 100 lines. Therefore, I would like to know the contribution of each developer.

Considering the current lines of code still in use: Is there a good git mechanism for finding who have contributed the most through the history of such lines?

Upvotes: 1

Views: 205

Answers (1)

Wojciech Kaczmarek
Wojciech Kaczmarek

Reputation: 2342

Very interesting question! Thanks for asking - it's what I'd also need for my team occasionally.

I came with a quick'n'dirty approximation of what you'd want:

What is needed is author grep over an output of git blame --line-porcelain.

(for f in `git ls-files`; do git blame --line-porcelain  $f |
   grep 'author '; done) | 
awk '{cnt[$_]++} END {for (x in cnt) print cnt[x],x}' | sort -rn -k1

It's not a rocket science nor a speed demon. It's just an aggregation over an output of git blame for all repository files, using standard Unix tools. But it shows some neat output..

I checked the numbers; sum of aggregated counters is equal to the sum of line counters for all files returned by git ls-files, which is what we expect.

Example for a repository of Elixir language:

97037 author José Valim
3151 author Aleksei Magusev
3017 author Alexei Sholik
3003 author James Fish
2837 author Bryan Enders
2677 author Eric Meadows-Jönsson
2667 author eksperimental
1604 author Andrea Leopardi
1109 author Bryan Endersstocker
1073 author Eric Meadows-Jonsson
1058 author Yurii Rashkovskii
901 author Yuki Ito
828 author Rafael Mendonça França
735 author John Warwick
689 author Paulo Almeida
[...]

Beware the use of git blame --line-porcelain instead of git blame --incremental. The latter outputs information in records corresponding to commits, not single lines; so the counting would be wrong.

[EDIT note] Whoever looked at the original answer, it contained a bug, which was explained together with a proper solution, and later edited by community to shorten the message.

Upvotes: 3

Related Questions