Reputation: 701

Renaming file name to older name loses history in Git

I have two files in my git repo.

Editor.cs
Editor2.cs

The first file is the older version of this class, which I kept around until I was sure the new version (Editor2.cs) worked.

Now, I want to delete Editor.cs and rename Editor2.cs to Editor.cs. However, when I do this in git (i.e., do a git rm on Editor.cs, then a git mv on Editor2.cs to Editor.cs), I seem to lose the history of Editor2.cs.

That is, when I view the log for the (newly renamed) Editor.cs, it shows me the history for the original Editor.cs (not Editor2.cs).

I've tried using git log --follow and it still shows me the history for the wrong file.

Sorry if this is confusing...

Thanks

Upvotes: 3

Answers (1)

torek

Reputation: 488183

There are some things you can do, if you want git log --follow to follow your file. It has nothing to do with using git mv, though. I will describe how Git works, and you can then decide what, if anything, you wish to do.

In fact, Git does not lose the history of the file, when you do what you are suggesting. Git does lose the ability to follow that history via git log --follow. This seemingly contradictory state is because Git never had the history of any file in the first place.

What Git has—what's stored in the repository—is a series of commits. Each commit is a complete snapshot of all files (as of that commit). When you run git log (without --follow), Git shows you these commits, one at a time, in some order. If the order is based on when the commits were made,¹ that shows you the history—or at least some subset of the history. But that's the history of commits, not of files.

Using git log --follow tells Git: Keep some commits hidden. Walk through the commit history as usual, but only tell me about commits where the difference between the immediate predecessor of this commit, and this commit itself, changed the one file I name on the command line. This is not file history; this is a subset of commit history, based on one file's name.

The trick is that with --follow, Git will automatically switch from looking for a file named, say, new.txt, to a file named, say, old.txt, if and when the change in some commit was to rename the file. This particular trick is enabled in a somewhat peculiar fashion, though. Git uses rename detection, in which it compares two commits and tries to guess (rather than remembering) whether some file(s) were renamed across those commits.²

To compare the parent commit to the child commit (two commits made in succession), Git extracts—well, virtually extracts—each of the two commits. It then looks at all the file names in each of the two commits. A file named read.me in the old commit is automatically assumed to be "the same" file as read.me in the new commit. But if the old commit has an old.txt and the new one does not, while the new commit has a new.txt and the old one does not have this file, then—for rename detection at least—Git will look at the contents of the two files. If what's in old.txt is sufficiently similar to what's in new.txt,³ Git decrees that these must be "the same" file, and therefore, old.txt got renamed to new.txt.

Git doesn't look at the file's names here, but instead, at the data.⁴ So it won't detect Editor.cs as similar to Editor2.cs unless the contents really are pretty similar. But more importantly, it will only compare the two files' content if there is one commit (the parent) that has only one of the files by one of those names, followed immediately by a second commit (the child) that has just one of those files (by the other name). If one commit has both files, and the other commit has one file, Git will just say the file was added (parent lacks + child has) or deleted (parent has + child lacks).

Thus, to get this detected as "same file but different name", you will need to make one commit that has only one of those files, followed by a newer commit that has only the other of those files. The content of the two files must also be sufficiently similar. "Exactly the same" is best—a 100% identical match is found much faster than a lesser match—and here using git mv can guarantee that you make a commit that has a 100% matching file under a changed name, which git log --follow is guaranteed to detect. But the key is to have the old name vanish and the new name appear, all in one commit. In your case, that means you will need to remove the new file, commit, then either:

rename the old file in place and commit again (making a 100% match), followed by replace the old file content with the new content (no name change = Git believes file is the same), or
remove the old file and create the new one and commit, so that there is a detectable rename, which will actually be detected if and only if the contents are 50% or more similar.

Doing either of these should result in git log --follow detecting a rename from the new name to the old name, after which git log --follow will start looking for the old name, and suppressing commits that do not modify a file that has the old name.

¹By default, the order is almost, but not quite, the order commits were made in. It's actually rather complicated. Adding --topo-order enforces a strict topologically-correct order, although given branching and merging, some commits will have no ancestor/descendant relationships.

²One advantage of doing rename detection like this is that Git can detect renames when comparing commits that are not immediately adjacent, such as when doing merges. The disadvantage is obvious: it's hard! It would be much easier to record "file X was renamed". Other version control systems do that.

³The default is to require that the files be "50% similar", using a computed similarity index. When using git diff you can tune this. When using git log --follow you get just the 50% number.

⁴The similarity index is given a 1% boost if the file's path names have the same final component. That is, dir/sub/file.txt might be 49% similar to dir2/sub2/xyz.txt and 49% similar to dir2/sub2/file.txt, but the latter has the same file.txt final component, so it gets boosted to the magic 50% mark.

Upvotes: 2

Renaming file name to older name loses history in Git

Answers (1)

Related Questions