Paddre
Paddre

Reputation: 828

How does git diff -M work?

I am currently trying to figure out how git diff -M<limit> works.

What I found out is, that git diff checks how similar two files (say fileA in revision 1, fileC in revision 2) are by calculating a similarity score. If the similarity score is >= limit, fileA has been renamed to fileC which has possibly been modified (if score is < 100%).

Then I asked myself, what if there are more files with the same sha1-hash within the directory? How does git know which one is the renamed (and changed) version?

To find this out, I tried the following:

First, I created two files with 7 lines ("a", "b", "c", "d", "e", "f", "g")

vi fileA
vi fileB

Then I added them to the repository and committed:

git add fileA fileB
git commit -m "Added fileA and fileB"

    [master ffc8964] Added fileA and fileB
     2 files changed, 6 insertions(+)
     create mode 100644 tests/fileA
     create mode 100644 tests/fileB

Next, I renamed fileA to fileC using git mv and deleted the first line in fileB and fileC. After that I commited the changes

git mv fileA fileC
vi fileB
vi fileC
git commit -a -m "Renamed and changed files"

    [master 57ff82a] Renamed and changed filed
     2 files changed, 2 deletions(-)
     rename tests/{fileA => fileC} (85%)

fileB and fileC now look like this:

b
c
d
e
f
g

What I expected now is that the checksums of fileB and fileC are equal:

git hash-object fileB fileC
    9fbb6235d2d7eb798268d4537acebea297321241
    9fbb6235d2d7eb798268d4537acebea297321241

Indeed they are :-)

So how should git diff now know what the renamed file is? Since fileC has been changed, a new blob has been generated by commit and the checksum of fileC and fileA are different as well (obviously).

I tried it:

git diff -M80% HEAD master~1

The output however confused me :-(

diff --git a/tests/fileC b/tests/fileA
similarity index 85%
rename from tests/fileC
rename to tests/fileA
index 9fbb623..f9d9a01 100644
--- a/tests/fileC
+++ b/tests/fileA
@@ -1,3 +1,4 @@
+a
 b
 c
 d
diff --git a/tests/fileB b/tests/fileB
index 9fbb623..f9d9a01 100644
--- a/tests/fileB
+++ b/tests/fileB
@@ -1,3 +1,4 @@
+a
 b
 c
 d

Apparently git diff DID find out that fileA has been renamed to fileC.

But how? Did git save some kind of connection between fileA and fileC?

Upvotes: 2

Views: 596

Answers (1)

user743382
user743382

Reputation:

No such connection was saved. Renames can be detected when one file is deleted, and another is added. Git saw that fileC was newly added, and went through all deleted files to see if it might have been a rename. Here, the only deleted file was fileA, so it was a rather quick check.

Note: it only has to go through the deleted files, because otherwise you wouldn't have a rename, you'd have a copy. Copies can be detected too, and it works roughly the same way, but they're covered by a separate option (-C).

Upvotes: 2

Related Questions