How to find the previous authors of all changed lines in git?

Question

Given a range of commits, say HEAD~1 and HEAD (i.e., just HEAD), I want to find previous authors of the lines that were changed in that range and how many lines they changed.

More precisely: for each line that was changed in the range, I want to get the previous author (using git blame, for example). Then I want to group by these authors summing up the changed lines.

For example, consider the file X that was changed by these people before HEAD (I marked the people that changed the lines at the beginning of the line, comparable to git blame's output):

Adam: Lorem ipsum dolor 
Adam: sit amet, consectetur
Adam: adipiscing elit.
Bob:  Praesent efficitur urna
Bob:  ac volutpat lacinia.
Bob:  Sed sagittis, metus non
Adam: maximus tristique, leo
Adam: augue venenatis enim,
Adam: ac rutrum nulla odio
Adam: id urna.

Now, author Carl changes the file as follows (note that this is a pseudocode mixture of git blame and git diff):

Adam: Lorem ipsum dolor 
Adam: sit amet, consectetur
- Adam: adipiscing elit.
+ Carl: adipiscing elit I love cats.
- Bob:  Praesent efficitur urna
+ Carl: Praesent efficitur urna :D
- Bob:  ac volutpat lacinia.
+ Carl: ac volutpat lacinia YOLO.
+ Carl: Added extra line, lol!
- Bob:  Sed sagittis, metus non
Adam: maximus tristique, leo
Adam: augue venenatis enim,
Adam: ac rutrum nulla odio
Adam: id urna.

So Carl changed 2 lines from Bob, deleted one line from Bob, and changed one line from Adam. Thus, the output of my script should be:

Bob: 3 Adam: 1

My overall solution would be:

Find changed line ranges
Pass these ranges with the -L parameter to git blame to query for the previous author
Do the final grouping myself by parsing git blames output and summing up.

I am currently struggling with 1.: getting the line range that were changed by the diff (In this case one range 3,6). Once I have these ranges, I can pass them to git blame -L to get the previous authors of these lines. So how can I make git diff or another git tool return the line ranges as numerical start,end pairs?

Scott Weldon · Accepted Answer

I don't know of a way to tell Git to do this, but I hacked together a solution to parse the output of git diff to get the values you need.

If you run git diff -U0, at the top of each chunk you will see something like this:

@@ -5,2 +5,3 @@

which means that 2 lines were deleted starting at line 5, and 3 were added there. (The -U0 parameter for git diff hides all context lines, so that only the lines that actually changed are printed. Without that parameter the line numbers would be incorrect.) There are three different scenarios that could occur for a given chunk: lines were added, lines were removed, or lines were modified (removed & added). The previous example shows what the header would show for modified lines. Added lines would look like this:

@@ -5,0 +6,2 @@

For your use case, we can ignore such lines. Removed lines would look like this:

@@ -5,5 +4,0 @@

Notice that the second number in each pair is an offset, showing how many lines were added/removed. Thankfully, git blame can also accept an offset for the value, so we can massage this into a format that git blame can accept.

Here is a bash one-liner that should do the trick:

git diff -U0 HEAD~1 -- $file | grep "^@@" | grep -Ev "@@ -[[:digit:]]+,0" | sed 's/^@@ //' | sed 's/ @@.*//' | cut -d' ' -f 1 | sed 's/[+-]//' | awk '{ if ($1 !~ /,/) { print $1",1" } else { print $1 } }' | sed 's/,/,+/'

Explanation:

$file is the current file you are processing.
The first grep command limits the output to the chunk headers, and the second grep command removes chunks representing added lines.
The first two sed commands remove everything but the range line numbers.
cut is used to get the first range value, i.e. the lines that existed in HEAD~1 that don't exist in HEAD.
The next sed command strips the leading status character.
If only one line is added or removed in a given chunk, git diff will use e.g. +2 as the range instead of +2,1. The awk command fixes that.
Finally, the last sed command replaces , with ,+ so that git blame knows the second value is an offset instead of a line number.

You can use each line of the output of the one-liner (saved to e.g. $row) as follows:

git blame -L$row HEAD~1 -- $file

How to find the previous authors of all changed lines in git?

Answers (1)

Related Questions