Reputation: 318
I recently learned about git blame
and what it does. I want to know how git finds when each line was changed in a file, even across file renames. In other words, I want to know how the blame algorithm works.
Upvotes: 2
Views: 417
Reputation: 3897
First of all, the blame
feature exists in almost all others SCM too, including CVS. So the algorithm used will vary according to the tool you're using.
Basically, however, the simplest way to achieve this is starting from the most recent state of your file, then browsing history backwards (toward the past) and applying the negative of each changeset.
Every affected row is marked as belonging to last commit, all other rows to previous one. Aside of this, you'll count the number of these latter rows. Then you restart this process with commit n-1 and n-2. If the rows don't explicitly belong to "n-1", they are ignored because this means they've been altered by some more recent commit (actually, the reverse changeset will still be applied, but commit number won't be updated). Otherwise, you apply the same computations, updating the commit number each row belongs to.
You then just have to iterate on this all the way down 'til initial commit if needed but if you reached a state where the "number of rows" quoted above reaches zero, you know you can stop here because it means that all the rows have been altered since the original state of the file and there's no more need to go any further.
Upvotes: 2