Git merge problem with same filename on a different location

Question

A weird question:

I have the following example:

folder A\File.txt
folder B\File.txt

The file in folder B is originally a copy of the file in folder A.

a collegue is certain she edited the file in folder A, yet the changes on master have been done to the file in folder B. Is this possible? Are there known issues where something like this might happen or are we looking at a human error?

torek · Accepted Answer

This can happen automatically under one condition: Git must "mis-detect" a renamed file. And, for Git to detect any renaming at all, we have to have some file that gets deleted, and some other file that gets created. A rename is therefore "the same" as a delete-and-add, in Git. (That's why it does not matter whether you use git mv, or just remove one file and add another that you made by copying.)

Remember that each commit stores a full snapshot of every file. There is no information available, when comparing any one commit to any other commit, as to how the one commit was changed to become the other commit. Instead, all Git has is the two snapshots. Git must now play a game of Spot the Difference. Of course, instead of two pictures with images in them to be compared (with, e.g., two clocks: are they displaying the same time?), what Git has is two sets of files, with each file having a file name—such as dir1/file.ext and dir2/file.ext, complete with (forward) slashes—and each file in each of the two commits having some contents.

This operation—comparing the two commits—is the realm of the git diff command, and the difference-spotting in Git is done by a diff engine. The diff engine in Git, in this case, starts with a list of every file name in the left-side commit, and every file name in the right-side commit.

If, for whatever reason, the left-side commit contains dir1/file1.ext but not dir2/file2.ext, and the right-side commit contains dir2/file2.ext but not dir1/file1.ext, then we have two stray files. Perhaps, Git says to itself, just maybe, rather than two stray files, we have here a file-pair. Perhaps the user renamed dir1/file1.ext to the new name dir2/file2.ext.¹ In order to decide whether that was the case, Git will compare the contents of these two files. Comparing the contents, Git obtains a percentage that it calls the similarity index.

Git repeats this process with every possible pairing of unpaired left and right side files, so if the left-side commit has this unpaired dir1/file1.ext and the right side has two unpaired files dir2/file2.ext and dir3/file3.ext, Git will compute a similarity index for the file1 → file2 rename, and a similarity index for the file1 → file3 rename. Whichever has the higher similarity-index score "wins" here. The score must also exceed some minimum threshold. The default is 50%, though you can choose some other value when you run git diff. If the best score exceeds the minimum threshold, Git declares that file1 was in fact renamed, to whichever right-side file won the scoring.

The exact scoring method is obscure, but in practice it works pretty well, at least as long as the file has at least a few kilobytes of text in it. Git also gives a secret 1% boost to the similarity index if the last "component" part of the right-side name matches the left side "component" part, where "components" are the parts separated by the (forward) slashes. So if there are two identical copies of what was dir1/file1.ext, in dir2/file2.ext and dir3/file1.ext, Git will decree that the rename was from dir1/file1.ext to dir3/file1.ext. The similarity scores would have been the same—say, 70% similar—but the file1.ext got a 1% boost, giving dir3/file1.ext a winning 71% score.²

¹Git, of course, does not actually talk to itself. But this works well as a mental model for what Git does do. (Also: don't anthropomorphize computers; they hate that.)

²This 1% boost trick is pretty cheesy. Git should have a smarter way of handling this, and modern Git sometimes tries to be smarter and recognize an entire-directory-rename. That code has had a bunch of issues though. Hopefully it's getting more stable and reliable now.

Why am I talking about diff, when you are looking at a merge?

The reason we need to spend time on how git diff works is because git merge runs two git diffs.³ The merge process has three input commits:

One of these three is your current commit, as selected by git switch or git checkout (whichever one you used to get here). You might run git switch main, for instance, to select the currently-last commit of main as your current commit.
One of these three is the commit you name on the command line: you might now run git merge feature/tall, for instance, to merge the last commit of feature/tall.
The third, and hugely important, commit is one that Git finds on its own, given these two inputs. This commit is the best common (shared) ancestor of both commits, as found in the commit graph. We'll ignore all the graph theory used to do this here, and just take it as magic, though there's no actual magic involved, and you can run git merge-base --all manually to find it yourself.

That last commit—the one Git finds on its own—is the merge base and it is how Git actually does a merge. At this point, having found the merge base commit, Git runs the two git diff commands. Each of these two git diff commands can find some set of file-rename operations and if it does, Git takes this into account when combining the changes to the various files.

Git might, for instance, decide that from base to "their" (feature/tall) commit, they renamed dir1/file1.ext to dir3/file1.ext. This can only happen if they don't have a dir1/file1.ext in their (feature/tall) tip commit, but of course, that's entirely possible, especially if this detected rename really did happen. But it could be that they removed dir1/file1.ext in some commit, and added a new but unrelated dir3/file1.ext, and Git just misfired in calling that a rename.

Note that when this does happen, it also matters what renames, if any, Git detects in comparing the merge base commit to your (main) tip commit that you switched to with git switch or git checkout. If Git detects that you did the same rename, that's all fine with Git. If Git detects that you don't have a dir3/file1.ext at all, and left dir1/file1.ext named dir1/file1.ext, Git will say that you didn't rename the file, and they did, and will combine these two pieces of work by keeping their rename.

If Git decides that both you and they renamed dir1/file1.ext, that's OK only if you both chose the same final name. In that case, Git keeps the rename. If you both renamed the file, but to two different names, you'll get a rename/rename conflict and git merge will stop the merge before it's done, so as to get help from the user doing the merge. It will become their responsibility to pick the correct file name, and to construct the correct merge result in general.

You can also see rename/delete conflicts, where one "side" of the merge renames a file but the other side deletes it entirely. All of these kinds of conflicts are what I call high level conflicts. Some people call them tree conflicts. All of them always result in git merge stopping to get help from the user: the -X ours or -X theirs options do not count here.

Note that you don't get a high-level / tree conflict if one side renamed a file (or Git thinks so anyway) and the other side did nothing at all to the same file, or changed its content. Any content changes to the file get merged as usual, with merge conflicts coming about if there are overlapping lines in the changes (or abutting lines, but let's ignore this kind of fine detail). Here, -X options will resolve those low-level conflicts as usual. Git will combine the high-level / tree-level rename on one side, with the no-change on the other side, by doing the rename.

If, during or after a merge, someone notices "hey some file went away" (because it got renamed) and puts back a copy of it, that could lead to what you've described. This could easily happen to someone who is not aware of how Git handles high-level conflicts.

³Again, the reality is actually much more complicated than this, but this suffices to get a good mental grasp of what merge does.

Git merge problem with same filename on a different location

Answers (2)

Why am I talking about diff, when you are looking at a merge?

Related Questions