guy_from_nowhere
guy_from_nowhere

Reputation: 41

Git does not track history after rename/move of modified file

Problem: I need to change the location of entire directories in a repository. To do that I use git mv and, if it's needed, I change the include header names to the currently proper ones. The problem is when I takes both actions during one commit. In that situation the file history is missing (git consider this as deleting and making new files).

Workaround: If I split those actions to separate commits the issue does not occur.

The problem reappears: However, even when I use the above solution the problem returns during the merging with master. I am obligated to use no-ff merge only. In this situation the new commit to master branch is made of changes from both commits and... history is not tracking properly anyway.

Another ugly workaround: I can deliver those commits separately to master. I cannot deliver uncompilable code but if I exclude it from the building process it could be doable... But it is ugly and so wrong...

I am wondering if there is a better solution to this problem.

Upvotes: 4

Views: 1924

Answers (2)

Alexandre Bodi
Alexandre Bodi

Reputation: 454

You can set the --follow option as default for the git log command:

git config --global log.follow true

Upvotes: 1

torek
torek

Reputation: 487755

If you are familiar with almost any other Version Control System (VCS), it can be very difficult to understand what Git does with file history.

The fact is that Git doesn't have file history. It may be unique among VCSes here (though I don't have experience with many of the more arcane VCSes). Its closest cousin, Mercurial, does have file history: each file added to Mercurial is assigned a unique number in what Mercurial calls the manifest, and this determines the file's identity. If you change the name of a file—or an entire directory full of files—they retain their identities, because this information exists in the manifest.

Git does away with this notion entirely. Git has no file history at all. Git has only commits.

Each commit stores a complete snapshot of a source tree. Each commit also has some number of parent commits, usually just one. This is much more like traditional commit-based VCSes: one can trace through the various commits, or look at file history. But since Git doesn't have file history, the only thing it has is commit history.

In order to implement git log --follow and other useful items, what Git offers, instead of file history, is rename detection. Git can look at any one specific commit, and compare that commit to its parent commit—or for merge commits, to all of its parents. When it does this comparison, it offers the option of detecting files that were renamed via that commit: files that had one name in the parent, but a different name in the child.1

Git even offers this rename detection when comparing two arbitrary commits, that are not just parent-and-child. Running:

git diff --find-renames $hash1 $hash2

compares the two commits, and wherever there is a candidate for "file with path a/b/c.txt in $hash1 sure looks a lot like file with path d/e/f.log in $hash2", Git may claim that the file was renamed (and then perhaps modified as well). It's important to remember, though, that Git is merely synthesizing a way to transform the first file into the second. The two actual files in the two commits are stored that way permanently. They can never be changed: as long as those commits exist, those two files are stored that way in those two commits. Those two files are not actually related at all unless you want them to be. Git is "finding" a rename by comparing them for similarity. Give Git a different set of "similarity" criteria—e.g., -M75% instead of -M50%—and Git may choose a different set of "sufficiently similar" files.

Nothing has happened to any of the commits. They are all frozen in time. But with a different set of "rename threshold" values, "break thresholds", and so on, Git may pair up different path names. Given --no-renames, Git will never pair up different path names (though it will still pair up files with the same name).

(This dynamic rename detection matters, somtimes a great deal, when merging, because merge runs two git diff --find-rename operations, from the merge base commit to each of the two branch tip commits that are being merged. If Git finds a rename, it believes it. If it does not find a rename, it believes that the base file was deleted in the tip, and a different file was created in the tip. You can control the rename threshold, but you cannot set break or copy threshold values, at least in Git versions up to today, 2.15.)


1The meaning of this is less clear for merge commits, since there is more than one parent: what does it mean for file child.txt to have had name p1.txt in parent #1 and p2.txt in parent #2? Traditional VCSes, with their unique internal numbering systems that determine file identity, assign a clear meaning here, but in practice, this meaning is not always useful, and Linus Torvalds' choice here, to do away with this notion entirely, may have been in part a reaction to that.

Upvotes: 3

Related Questions