max
max

Reputation: 52263

Which files "git checkout" modifies?

I was on feature branch, and wanted to checkout to develop.

Relative to develop branch, in my feature branch file "D1" was modified.

Relative to develop branch, in my working tree file "M" was modified, "D1" and "D2" were deleted.

Relative to feature branch, my working tree of course had files "D1" and "D2" deleted, and "M" modified. Nothing was staged.

When I ran git checkout develop, git added back file "D1" to the working tree, but did nothing about "D2" and "M". Why did git treat "D1" and "D2" differently? And why didn't it complain about "M" being modified - even though it's not committed yet?

I tried to follow the explanation in this SO answer about which files git checkout new_branch can safely modify. But it wasn't enough to understand this outcome.

Edit (answer to @torek comment):

Initially, I'm on feature. git status --short says (I put file names in quotes to make it obvious):

D "D1"
D "D2"
M "M"

.gitignore is empty.

git diff --name-status feature develop says:

M "D1"

git rev-parse commands only yield commit IDs.

Upvotes: 0

Views: 357

Answers (1)

torek
torek

Reputation: 488223

The setup

We don't have raw commit IDs here, but we don't need them. We also can't draw much of a commit graph, but we don't really need it either. But, just for concreteness of a sort, I'll draw in some pretend ones, just so that there are things to refer to and we don't have to write out the names feature and develop all the time.

What we do have is two different commits (which are part of a commit graph), an index, and a work-tree. We're on branch feature initially as well.

Some definite things about the two commits

Let's draw the commits first:

...--A--B--...--D     <-- develop
         \
          C--...--F   <-- HEAD -> feature

Commit D is the tip of branch develop and commit F is the tip of branch feature. Since we're on branch feature, the file .git/HEAD contains the name feature, which is how Git knows to work with/on branch feature.

Commit D contains a bunch of files, including at least file D1. It's not immediately obvious from all of the above output whether commit D contains file D2 and M. (It's usually best to show exact output from each command, in case there's something subtle going on here.) But, running git status --short is supposed to have printed this:

D "D1"
D "D2"
M "M"

This isn't quite right. (Here's some actual output from a different git status --short:

$ git status --short
 M pytest/p9conn.py
?? backend/backend.h+

Note the leading space in front of pytest/p9conn.py, above the first ? on the next line: this indicates that the M in the output is telling us that the modification is between the index and work-tree, rather than the current commit and the index.) I'm going to assume that instead, it printed:

 D D1
 D D2
 M M

which includes a leading space in front of each D. That means that files D1 and D2 have specifically been removed, but not git rm-ed.

(Perhaps the actual output was instead:

D  D1
D  D2
M  M

Note the two spaces between each letter-code and the file names. In this case, the two D files have been git rm-ed, and file M is git add-ed. But you said "nothing was staged" so I'm assuming one space, then the letter-code, then one more space, then the file names.)

The index/staging-area and work-tree

If nothing was staged, the index (or staging area) must necessarily match the current commit (which, remember, is F: HEAD points to branch-name feature and branch-name feature points to commit F). Since git status --short had spaces before each letter-code, this also tells us that the index has the same entries as the HEAD commit, i.e., that nothing is yet staged for a new commit.

But we did see two D (deleted) files: D1 and D2 are both showing as deleted from the work-tree. To be deleted from the work-tree, they must be in the index. That means files D1 and D2 are both in the index right now—and as we just noted, the index matches the HEAD commit. Likewise, we saw one M file. To be modified, file M must be in the index, and hence in commit F. This tells us more things for certain about the current HEAD commit F. Let's write them in.

More definite things about the two commits

We now know for certain that files D1, D2, and M are all three in commit F. And, we ran:

$ git diff --name-status feature develop
M "D1"

This tells us that files D1, D2, and M are also in commit D!

Obviously file D1 is in both commit D on develop and commit F on feature, because it's different between the two branch-tips.

But now we also know that file D2 is in both commits and is identical in both commits. We know this because we proved, from the git status output, that D2 is in commit F. If D2 were not in commit D, we'd see it as deleted when seeing how to change from commit F to commit D. We don't see that, so it must not be deleted. Moreover, it must be the same in both commits: if it were different, we'd see it as modified when seeing how to change from F to D.

The same goes for file M. We don't see it in the git diff --name-status output, so it must exist in commit D at the tip of develop, and it must be the same there as it is here in feature.

The git checkout

When I ran git checkout develop, git added back file "D1" to the working tree, but did nothing about "D2" and "M". Why did git treat "D1" and "D2" differently?

The way git checkout develop (which needs to switch the work-tree contents to some extent, to try to match commit D) does its job begins with comparing the current index to the target commit.

We know that the current index has files D1, D2, and M in it. We also know that the version of D1 in the index matches that of commit F and therefore differs from the version of D1 in commit D.

This means that for git checkout develop to succeed, it will have to extract the version of file D1 from commit D, to the index, and hence on to the work-tree. That won't clobber any existing file, as the work-tree version of D1 is just gone. Git puts that operation in a queue of "work to do" before actually doing it, then goes on to look for more work, since it needs to make sure that the whole checkout is OK, then do everything all at once, or else stop and do nothing.

What else does Git need to change in the index? Well, it looks over the index contents for file D2, but that's the same in F and D, so to switch from F to D, there's nothing to do. It also looks over the index contents for file M, but again, that's the same in both F and D: there's nothing to do there.

(It also must compare any and all additional files in commit D and/or in the current index, but those all match as well.)

So, having now determined that no files will be clobbered, Git goes back to the work-queue, which says "extract file D1 from commit D into the index and the work-tree". It does that, then updates .git/HEAD to refer to develop (which then points to commit D), and the checkout is complete.

And why didn't it complain about "M" being modified - even though it's not committed yet?

It probably should, in some theoretical sense, and some versions of Git seem to do that in at least some cases. However, as we saw, the index version of M matches in both F and D, so git checkout does not have to change M at all. Your particular version of Git not only doesn't bother changing M, it does not even bother looking at it, to see that M is modified. So it doesn't print M M.

Anyway, that's the answer: git checkout finds the minimum amount of work it can do. Since commits D and F differ only in the contents of file D1, that's the minimum work: change the contents of D1 as stored in the index and work-tree.

Sometimes there's more work: besides changing file contents, sometimes git checkout must create a file (that in the --name-status diff from "current commit" to "proposed checkout", shows as Added). Sometimes git checkout must remove a file (that in the diff from "current commit" to "proposed checkout", shows as Deleted). And of course, it's possible that instead of simply not existing, these files might exist in the work-tree (whether or not they're in the index) with contents that don't match one or both commits. In these cases Git has to be extra-careful with work-tree contents. (This is when .gitignore starts to matter: if a file is listed in .gitignore, Git feels free to clobber it!)

Upvotes: 1

Related Questions