Reputation: 40048
Given a range of commits, say HEAD~1
and HEAD
(i.e., just HEAD
), I want to find previous authors of the lines that were changed in that range and how many lines they changed.
More precisely: for each line that was changed in the range, I want to get the previous author (using git blame
, for example). Then I want to group by these authors summing up the changed lines.
For example, consider the file X that was changed by these people before HEAD
(I marked the people that changed the lines at the beginning of the line, comparable to git blame
's output):
Adam: Lorem ipsum dolor
Adam: sit amet, consectetur
Adam: adipiscing elit.
Bob: Praesent efficitur urna
Bob: ac volutpat lacinia.
Bob: Sed sagittis, metus non
Adam: maximus tristique, leo
Adam: augue venenatis enim,
Adam: ac rutrum nulla odio
Adam: id urna.
Now, author Carl
changes the file as follows (note that this is a pseudocode mixture of git blame
and git diff
):
Adam: Lorem ipsum dolor
Adam: sit amet, consectetur
- Adam: adipiscing elit.
+ Carl: adipiscing elit I love cats.
- Bob: Praesent efficitur urna
+ Carl: Praesent efficitur urna :D
- Bob: ac volutpat lacinia.
+ Carl: ac volutpat lacinia YOLO.
+ Carl: Added extra line, lol!
- Bob: Sed sagittis, metus non
Adam: maximus tristique, leo
Adam: augue venenatis enim,
Adam: ac rutrum nulla odio
Adam: id urna.
So Carl changed 2 lines from Bob, deleted one line from Bob, and changed one line from Adam. Thus, the output of my script should be:
Bob: 3 Adam: 1
My overall solution would be:
-L
parameter to git blame
to query for the previous authorgit blame
s output and summing up.I am currently struggling with 1.: getting the line range that were changed by the diff (In this case one range 3,6). Once I have these ranges, I can pass them to git blame -L
to get the previous authors of these lines. So how can I make git diff
or another git tool return the line ranges as numerical start,end
pairs?
Upvotes: 3
Views: 535
Reputation: 10217
I don't know of a way to tell Git to do this, but I hacked together a solution to parse the output of git diff
to get the values you need.
If you run git diff -U0
, at the top of each chunk you will see something like this:
@@ -5,2 +5,3 @@
which means that 2 lines were deleted starting at line 5, and 3 were added there. (The -U0
parameter for git diff
hides all context lines, so that only the lines that actually changed are printed. Without that parameter the line numbers would be incorrect.) There are three different scenarios that could occur for a given chunk: lines were added, lines were removed, or lines were modified (removed & added). The previous example shows what the header would show for modified lines. Added lines would look like this:
@@ -5,0 +6,2 @@
For your use case, we can ignore such lines. Removed lines would look like this:
@@ -5,5 +4,0 @@
Notice that the second number in each pair is an offset, showing how many lines were added/removed. Thankfully, git blame
can also accept an offset for the <end>
value, so we can massage this into a format that git blame
can accept.
Here is a bash one-liner that should do the trick:
git diff -U0 HEAD~1 -- $file | grep "^@@" | grep -Ev "@@ -[[:digit:]]+,0" | sed 's/^@@ //' | sed 's/ @@.*//' | cut -d' ' -f 1 | sed 's/[+-]//' | awk '{ if ($1 !~ /,/) { print $1",1" } else { print $1 } }' | sed 's/,/,+/'
Explanation:
$file
is the current file you are processing.
The first grep
command limits the output to the chunk headers, and the second grep
command removes chunks representing added lines.
The first two sed
commands remove everything but the range line numbers.
cut
is used to get the first range value, i.e. the lines that existed in HEAD~1
that don't exist in HEAD
.
The next sed
command strips the leading status character.
If only one line is added or removed in a given chunk, git diff
will use e.g. +2
as the range instead of +2,1
. The awk
command fixes that.
Finally, the last sed
command replaces ,
with ,+
so that git blame
knows the second value is an offset instead of a line number.
You can use each line of the output of the one-liner (saved to e.g. $row
) as follows:
git blame -L$row HEAD~1 -- $file
Upvotes: 4