Reputation: 828
I am currently trying to figure out how git diff -M<limit>
works.
What I found out is, that git diff
checks how similar two files (say fileA
in revision 1, fileC
in revision 2) are by calculating a similarity score. If the similarity score is >= limit
, fileA
has been renamed to fileC
which has possibly been modified (if score is < 100%).
Then I asked myself, what if there are more files with the same sha1-hash within the directory? How does git
know which one is the renamed (and changed) version?
To find this out, I tried the following:
First, I created two files with 7 lines ("a", "b", "c", "d", "e", "f", "g")
vi fileA
vi fileB
Then I added them to the repository and committed:
git add fileA fileB
git commit -m "Added fileA and fileB"
[master ffc8964] Added fileA and fileB
2 files changed, 6 insertions(+)
create mode 100644 tests/fileA
create mode 100644 tests/fileB
Next, I renamed fileA
to fileC
using git mv
and deleted the first line in fileB
and fileC
. After that I commited the changes
git mv fileA fileC
vi fileB
vi fileC
git commit -a -m "Renamed and changed files"
[master 57ff82a] Renamed and changed filed
2 files changed, 2 deletions(-)
rename tests/{fileA => fileC} (85%)
fileB
and fileC
now look like this:
b
c
d
e
f
g
What I expected now is that the checksums of fileB
and fileC
are equal:
git hash-object fileB fileC
9fbb6235d2d7eb798268d4537acebea297321241
9fbb6235d2d7eb798268d4537acebea297321241
Indeed they are :-)
So how should git diff
now know what the renamed file is? Since fileC has been changed, a new blob has been generated by commit
and the checksum of fileC
and fileA
are different as well (obviously).
I tried it:
git diff -M80% HEAD master~1
The output however confused me :-(
diff --git a/tests/fileC b/tests/fileA
similarity index 85%
rename from tests/fileC
rename to tests/fileA
index 9fbb623..f9d9a01 100644
--- a/tests/fileC
+++ b/tests/fileA
@@ -1,3 +1,4 @@
+a
b
c
d
diff --git a/tests/fileB b/tests/fileB
index 9fbb623..f9d9a01 100644
--- a/tests/fileB
+++ b/tests/fileB
@@ -1,3 +1,4 @@
+a
b
c
d
Apparently git diff
DID find out that fileA
has been renamed to fileC
.
But how? Did git save some kind of connection between fileA
and fileC
?
Upvotes: 2
Views: 596
Reputation:
No such connection was saved. Renames can be detected when one file is deleted, and another is added. Git saw that fileC
was newly added, and went through all deleted files to see if it might have been a rename. Here, the only deleted file was fileA
, so it was a rather quick check.
Note: it only has to go through the deleted files, because otherwise you wouldn't have a rename, you'd have a copy. Copies can be detected too, and it works roughly the same way, but they're covered by a separate option (-C
).
Upvotes: 2