TKUK
TKUK

Reputation: 11

git checkout HEAD and git checkout

After issuing the command 'git fetch', do I need to use 'git checkout origin/master file_xyz' or git checkout HEAD origin/master file_xyz, to update the changes in the file 'file_xyz' with the copy of the file from origin/master? My current working branch is different from origin/master. Will 'git fetch' update both local repository as well as staging area ?

Upvotes: 0

Views: 4561

Answers (1)

torek
torek

Reputation: 489848

The boldface and italics below are mine, and I assume you mean what you are asking, but I also suspect that you don't quite understand what you are implying here.

After issuing the command git fetch, do I need to use git checkout origin/master file_xyz or git checkout HEAD origin/master file_xyz, to update the changes in the file file_xyz with the copy of the file from origin/master?

Neither of these is right, though the first one is closest. You probably want to use git merge, but this operates at the commit level, not the file level.

My current working branch is different from origin/master. Will git fetch update both local repository as well as staging area?

No. Moreover, you really don't want it to update any entries in what Git calls, various, the index, the staging area, or the cache (depending on who or what part of Git is doing said calling). The job that git fetch has is to call up some other Git, and download from that Git, whatever commits they have that you don't. Having downloaded those commits—so that now you have them too—your own Git will update your remote-tracking names, such as origin/master, to remember which commits their branch names, such as their master, identified. See the description of branch below.

Changes vs snapshots

Your question starts out by asking about changes in some file. It's worth emphasizing (as many times as it takes!) that Git does not store changes. Git stores snapshots. If you take several snapshots over time, and compare two snapshots, you can find out what changed. This is like one of those "spot the difference" puzzles, and the key realization here is that you must take two snapshots to find some difference. One snapshot is just a snapshot: there are no changes.

Branches and commits

The word branch in Git is ambiguous, or at least, it is the way people use it. It can refer to the name, such as master, or to one specific commit (such as 32a38237f30759f18b72d069aebd81bbde47bbec if that's the tip of master), or even to a whole series of commits ending with that last commit. See What exactly do we mean by "branch"? It's usually clear enough what someone means, but when it isn't, it's a good idea to find out: did they mean the name, the tip commit, or some or all commits reachable from the tip commit?

You said My current working branch is different from origin/master and that's trivially true if, by "working branch", you mean the branch name that git status reports when it says:

On branch develop
Your branch is ...

for instance. That's because you cannot be "on" any remote-tracking name: remote-tracking names, such as origin/master, are not actually branch names, and git checkout won't put you "on" such a name. Instead, you will end up in what Git calls "detached HEAD" mode (which I won't go into here).

If you are on a branch—if git status will say On branch B for some B—then the name B identifies one particular commit, and that commit is your current commit. This could be the same commit that origin/master identifies, before and/or after git fetch. Or, it could be some different commit.

A commit in Git is a unique entity, identified—and accessed—by its hash ID. The hash ID is a big ugly string of letters and digits, such as 32a38237f30759f18b72d069aebd81bbde47bbec, that appears to be completely random (though it's very much not random: it's actually a cryptographic checksum of the contents of the commit). Every commit has its own, different, unique hash ID. Two different Git repositories contain the same commit if they both have a commit with the same hash ID. That's how your Git knows, when you have it call up the other Git, whether it needs to download some commit: the other Git says, e.g., I have commit 32a38237f30759f18b72d069aebd81bbde47bbec and your Git checks and either says: OK, give it to me or Nah, I already have that one.

These hash IDs are pretty much useless to humans, and as a result, we give ourselves names for specific commits. This is where the branch names come in: their Git calls their latest master-branch commit master. Their name master identifies their latest commit. This commit might be in your repository, or it might not be. You run git fetch, your Git calls up their Git, your Git finds out what their hash ID is, and your Git gets the commit if necessary. Now you definitely have it, and now your Git updates your origin/master to remember: Their Git last said that their master was commit __(fill in the blank)__.

Every commit stores a full, complete snapshot of all files. These are in a special, compressed, frozen, Git-only form. Because they're frozen, Git can share the files in a series of commits, whenever they are unchanged from the previous commits. You cannot change any part of any frozen entity in Git—including commits. Each commit also stores some metadata, such as who made it, when, and why (the log message).

Commits also store the hash ID of their parent, or previous, commit. This allows Git to start at the end—at the last commit in a branch—and work backwards:

... <-F <-G <-H   <--last

The branch name—such as last—holds the actual hash ID of the commit we'll call H. That lets Git find the commit itself. The commit holds the hash ID of its parent G, which lets Git find G. G holds the hash ID of its parent F, and so on.

This is, ultimately, how and why branches work the way they do in Git. The name identifies the last commit, and the rest of the commits are found by working backwards.

The index and the work-tree

Files inside commits are frozen and Git-only, which is fine for archiving but useless for getting work done. So Git has to have a way to extract files from commits, into an area where you can work on and with them. That area is the work-tree.

This would be good enough: you have a current frozen copy of each file, saved in a commit, and a working copy of each file, in your work-tree where you can use it. You could tell Git: Make a new commit from my work-tree and Git would re-compress every file, compare it to the current compressed one, and see if it needs to save a new copy or can re-use the old copy. When it finally finished compressing every file, it would be ready to make the new commit. This would be effective, but slow—too slow for Linus Torvalds, for instance. So that's not quite what Git does.

Instead, Git keeps a third copy of every file. The third copy is already compressed and ready to go into the next commit. But unlike the copy in the current commit, it's not quite frozen. It's just ready to freeze. If you've done something to the usable copy of the file in your work-tree, you can run git add filename to copy the work-tree copy back into the "ready to go" copy. That file is now staged for commit. Note that it was already there before, it's just that the copy that was there before is the same as the copy that is (still) in the current, frozen commit. The git add process just overwrite it with the new updated one.

This area, which holds all the files from the commit that Git extracted and put into your work-tree and are ready to go into the next commit—staged, in other words—is variously called the index, the staging area, or the cache, depending on who or which part of Git is doing the calling. Hence the intent of the index / staging area is to remember what goes into the next commit.

Saving new snapshots

At some point you probably took file_xyz in your work-tree and did something with it. You may now want to save that something for future reference. For instance, you'll be able to compare it to the version in the commit just before that commit, to see what you changed. You take the two snapshots—the parent of your new commit, and your new commit—and compare them and spot the difference.

To do that, you run git add file_xyz, and then git commit. The commit command collects your log message—why you did whatever you did—and adds your name and the current time and all that, saves the hash ID of the current commit, and makes a new commit out of it:

...--F--G--H   <-- [before your new commit]
            \
             I   <-- [your new commit]

To remember the hash ID of I, your Git now writes that hash ID into the name of the branch you have checked out. If that's master, you now have:

             I   <-- master (HEAD)
            /
...--F--G--H   <-- origin/master

The name HEAD is attached to the current branch, which is how Git knows that it was supposed to update master. That's also how git status knew to say On branch master. You presumably created your master from origin/master—the commit their Git had as their master—so your origin/master still remembers hash ID H.

Fetching their new commits

Now you run git fetch. Your Git calls up their Git and obtains their new commit(s). Let's say they have one such:

             I   <-- master (HEAD)
            /
...--F--G--H--J   <-- origin/master

Since their master now identifies commit J, your origin/master also now identifies their new commit J.

Merging changes

At this point you can run:

git merge origin/master

This tells your Git: Find the common starting point commit, the one from which I made some changes and committed, and from which they made some changes and committed. Figure out what changes we both made, and combine them automatically.

So your Git walks backwards from I and from J, and finds that you both started from commit H. This is the merge base.

Your Git then runs, more or less:

git diff --find-renames <hash-of-H> <hash-of-I>   # what we changed
git diff --find-renames <hash-of-H> <hash-of-J>   # what they changed

If you changed file_xyz and they did not, or if they changed it and you did not, combining the changes is easy: Git can just take your version or their version. If you both changed the file, Git does a line-by-line comparison of the merge base version of the file—from commit H—against yours and against theirs. It then combines the two sets of changes, applies the combined changes to the file from H, and uses that for the merge result.

If all the combining, for all the files, goes well, Git makes a new commit from the result:

             I--K   <-- master (HEAD)
            /  /
...--F--G--H--J   <-- origin/master

Your name master now points to this new commit, which combines their changes and your changes. The new commit has two parents—commit I, your previous branch tip, and commit J, their current branch tip—and has as its contents, including its frozen file_xyz, the combined changes as applied to H, the merge base.

If you really want to discard your own changes

If, after all, you want to throw away your changes to file_xyz and just use theirs, then you want the first command you suggested:

git checkout origin/master file_xyz

This tells your Git: Find the commit origin/master identifies. Reach into that commit and get the frozen contents. Put those frozen contents into my index, so that they will be in my next commit, and unfreeze / de-compress them into my work-tree so that I can see and use them. Your current commit does not change at all, but your index and work-tree now hold that version of the file.

Note that this same trick works if you want to go back to the saved version from commit H:

git checkout <hash-of-H> -- file_xyz

Since there is no obvious name for commit H, you probably want to use its hash ID. The -- here is not required—we didn't need it before—but is a good habit to get into, because while file_xyz does not look like a branch name or git checkout option, other names might. If you have a file named -f, or a file named master, the -- tells git checkout that you are naming a file, not an option or branch.

Upvotes: 1

Related Questions