Colin Ford
Colin Ford

Reputation: 121

Git: Getting a file's path at a given commit hash

Say I have a repository where I have a file, a.txt, that at some point got renamed to b.txt. Say this happened a long time ago in the history, or maybe a few times, so I'm not quite sure that the file name was at a certain commit.

If I run a git log on b.txt with the --follow flag it looks something like this:

$ git log --name-status --pretty=oneline --abbrev-commit --follow -- 'b.txt'

45c5d11 (HEAD -> master) Edit to 'b.txt'
M       b.txt
cb4ce19 Renaming 'a.txt' to 'b.txt'
R100    a.txt   b.txt
13973ff Edit to 'a.txt'
M       a.txt
9620e34 Adding 'a.txt'
A       a.txt

Exactly what I want! But, when I try to show the file at a commit when it was named something else, I get this error:

$ git show 9620e34:'b.txt'
fatal: path 'b.txt' exists on disk, but not in '9620e34'

So, is there either:

Upvotes: 2

Views: 446

Answers (1)

torek
torek

Reputation: 489708

The short answer is just "no". :-) Essentially, you must run git log --follow and find the name-change yourself, if you can find it at all, and then use the correct name, manually.

The real problem here is that each commit is a complete, but independent, snapshot of all of your files:

  • Commit 9620e34 has one file, a.txt. That's all it has. It has no b.txt. (Well, it might have other files that are neither a.txt nor b.txt; I can't tell from the above whether this is the case.)

  • Commit 13973ff has a complete (but different) copy of a.txt.

  • Commit cb4ce19 has a complete copy of b.txt and no a.txt at all. The copy of b.txt in cb4ce19 matches, 100%, the copy of a.txt in 13973ff—but these are two different file names.

  • The last commit, 45c5d11, has a complete (but different) copy of b.txt.

When you use git log, what git log does is walk through the commits, one commit at a time, starting where you are now (or where you specify) and working backwards. It can optionally compare each commit to its (single) parent,1 with the parent commit on the left and the child commit on the right. The comparison of the two snapshots—the diff—produces, as its output, a recipe: Do these things and you'll change the left-side snapshot's files so that they match the right-side snapshot's files. Usually that's pretty close to what someone actually did, or even exactly what they did. Occasionally, it's not what they did at all, but it produces the same result.

With --name-status, git log prints just the file name and a status letter, rather than the full recipe.2 When the recipe includes "rename this file" you get the R letter, and a similarity index (percentage): 100 means the files matched, 100% identical, on the left and right sides. This kind of match is much faster than approximate matches, so when you do the rename without also changing the content of the file, as in this case, it's being nice to Git. (It might sometimes be nice if Git were nicer back to you.)

Adding the --follow option turns on Git's rename detection machinery. It may or may not already be on for various other operations—you can configure git diff to control this, for instance, when running git diff without specifying -M to turn it on manually—but this guarantees that it's on for this particular kind of git log. Finding renames requires that the rename detector run. The rename detector is computationally expensive (though much less so for 100% matches), so git log operations that don't need it generally don't use it at all.

The --follow implementation, however, is pretty sleazy: internally, Git chops down each commit to the one file of interest in the child commit, and then compares that file to its counterpart in the parent commit. If the file isn't in the parent, Git invokes the rename detector for that one file (rather than for all files as usual) to see if it can find a matching-content-but-different-name file in the parent.3 If so, it simply change the name that it looks for, while telling you about this one rename. From this point backwards—remember, git log is working backwards—it starts looking for the old name instead of the new one.

When you use git show, you point it directly to some commit. It does not get there by starting at the last commit and working backwards, the way git log does. It just starts right at the commit you name. That commit has a parent hash ID (stored in the commit's metadata) and git show can therefore git diff the parent and child, if you use it as git show <hash>; that commit has a snapshot, and git show can therefore extract and display one file, if you use it as git show <hash>:<path>. But it hasn't walked backwards from the commit where you are now to that one commit you speciifed, so it has no idea whether the path you give here might correspond to some other, different, path in the commit you specified.

Git probably should have some command to trace a file's name: you'd give it an optional starting point, defaulting to HEAD, a path name, and a target commit; and it would produce the result of doing the equivalent of git log --follow on that name and discovering renames until reaching the target, or produce an error—or perhaps just print the original path—if it never reaches that target commit. This would be a useful plumbing command. (It probably should take multiple path names, too, and git log's rename detector should be beefed-up and used to implement this command.) But it doesn't exist.


1When git log hits a merge commit, which—by definition—is one with two or more parents, git log usually just doesn't do any comparing at all. When using --follow it changes strategies somewhat, but --follow works poorly across merges.

2It also cuts out actually finding the changes beyond whatever is necessary to come up with the status letter, as finding a change-recipe is slow.

3When looking at merges, I'm not sure what the rename detector does. It has all parents available, but the code here is very squirrelly and hard to follow. In any case there's only a single global variable holding the to-be-followed name, so if the name is different in two different parents, the code literally cannot handle this correctly, for any sensible definition of "correctly".

Upvotes: 4

Related Questions