educob
educob

Reputation: 83

Git branch -d newBranch, and then git checkot master didn't undo the changes in newBranch

This is the 1st time this happens to me and I am quite shocked. Yesterday I created a new branch with git checkout -b recipients. I checked where I was with git branch. I made some changes and then I wanted to go back to master with git checkout master. But the changes where still there even though I was back in master.

enter image description here

What did I do wrong?

Thanks.

update1: Now that I think it I didn't commit nor stash but as you can see, git didn't complain (as I think it does in those circunstamces)

Upvotes: 0

Views: 32

Answers (1)

torek
torek

Reputation: 489173

You need to re-adjust your mental model. You're thinking of Git as working with files but that's not how Git is inside. Internally, Git works with commits. To see how this all works—and thus make sense of what has just happened—you have to know a lot about commits and how Git makes new ones.

So:

  • Each commit has a unique number. These numbers aren't simple counting numbers: we don't have commit #1 followed by 2, 3, and so on. Instead, each number is a big, ugly, and random-looking hash ID, such as e1cfff676549cdcd702cbac105468723ef2722f4. The commits are stored in a big key-value database with the commit's unique number as the key, and the commit's contents as the value.

  • The numbers aren't actually random at all. Instead, they're cryptographic checksums of whatever is inside the commit. Git can find any commit—or any other internal Git object—by its hash ID. However, because the ID is a checksum of the content, the content is automatically read-only. If you were to take one out of the database, fuss with it to change even just one single bit, and put the result back, what you would get is a new and different commit with a new and different key. The old commit is still there, under its original key.

  • What's inside a commit comes in two parts. One part is a complete snapshot of all the files that Git knows about. We'll come back to this in a moment, but for now, let's note that each file itself is stored in a read-only, compressed, and de-duplicated format, that only Git itself can read. The other part of a commit is its metadata: information about the commit, such as who made it, when, and so on. As part of this metadata, Git stores the raw hash ID—the key in the key-value database—for the previous commit.

There are a bunch of consequences of these three facts, but right now the one we care about the most is the read-only-ness of the commit and all of its stored files. If the files in a commit literally can't be changed—and they can't—then how do we get files we can change? This is where Git's index or staging area comes in, and where your working tree comes in.

Your working tree

The copies of files that you can see and fiddle with aren't in Git at all. They are in your working tree. These are plain, ordinary, everyday files. When you first clone a repository, your working tree is completely empty. Git extracts the files from some commit and fills your working tree, and now you have files. This also takes care of the fact that the files inside the commit aren't readable by non-Git programs.1

After you tell Git to extract files into your working tree, these files, and the entire tree itself for that matter, are yours to do with as you will. Git doesn't actually use these files until you tell it to do so. You might expect git commit to use them, but it doesn't,2 as we will see in the next section.


1There are actually multiple internal formats, and one of them—called a loose object—isn't difficult at all, so some programs could or can read these files directly. The packed form of objects, however, is more complicated. A few programs can read pack files but they all seem to be oriented towards using or working directly with Git. It's much simpler just to let Git deal with these.

2Some options and arguments do make git commit use working tree files. We'll cover that lightly below.


Git's index

Other version control systems stop at this "two copies of a file" thing: there is a committed version, in some internal format, that literally can't change, and a plain-file version that you can work with. Their "commit" verb, however it may be spelled, uses the plain files. But Git doesn't do that: instead, Git adds a third version, sort of in between the committed copy and the working tree copy.

This third "copy"—the word "copy" is in quotes here because it's in the internal, de-duplicated format, so it's actually automatically shared—of each file is in something that Git has three names for. This thing is called the index—a name that has no real meaning—or the staging area, which refers to the way you use it. The third name, which is mostly just seen in --cached options these days, is the cache, which refers to the way Git uses this thing internally to make Git go fast.

Initially, the index (or staging area) holds the same copy of every file from the current commit. This copy, being in the internal format, is ready to go into the next commit you will make. But it's not actually a commit, so unlike a real commit, it's not read-only. You can't change an existing file that's in it, but you can put a new copy of a working tree file into it.

If that doesn't make sense, think of it this way instead: when you run git add file, what Git does is read the working tree version of the named file. Git compresses this down into the read-only, de-duplicated format, right at the time you run git add. If that file is already in any other commit, Git just re-uses the frozen copy. If not, Git has now prepared a freeze-able copy.3 Either way, the index is still ready to commit.

So the index, or staging area if you prefer that term, is always ready to go,4 to make the next commit. A good way to think about it, then, is that the index holds your proposed next commit. This proposal is independent of your working tree. Git will, at git commit time, use whatever is in the index right then to make the new commit. The index holds all the files that Git knows about, in ready-to-commit form.

There are some commit options that fiddle with this model a bit. In particular, git commit -a and git commit --include file are short for for update the index as if I ran git add, then commit using the updated index. There's also git commit --only file, which is more complex and we won't cover it here.5 But thinking of the index / staging area as proposed next commit really does work; so do that.


3Technically, git add just makes a new-or-reused blob object out of the file, then writes the blob's hash into the index. If the blob object was already in the database, Git re-uses the existing blob; otherwise there's a new blob object in the database. If you end up never committing it, that blob object clutters up the database for a while, but eventually git gc finds it and strips it out. Normally, nobody cares about the minor bit of bloat here, but occasionally someone will git add a multi-gigabyte file by accident, and that's somewhat painful.

4During a conflicted merge, the index expands. This expanded index isn't ready to commit. This is a pain point in current Git, because any conflict ties up the index-and-working-tree pair until the conflict is resolved or the in-progress operation is aborted. However, git worktree add gets around most of the pain.

5These work by making one or two temporary index files. Git will then update the temporary index files and use those while committing. The details can get quite sticky. Normally you don't need to care, but if you choose to write a Git pre-commit hook, it's important to know all the details.


A short section on branch names

The above is all we need to cover what Git just did, but can leave you puzzling over several things, so let's go over a bit more before we finish up. Go back to the three facts-about-commits at the top. Read through the third bullet point again if needed: each commit stores the actual hash ID of its previous commit. Git calls this the parent of the (implied to be child) commit.

What this means is that commits form backwards looking chains. If we know the hash ID of the last commit in some chain, that's all we need. Consider, for instance, this drawing of a simple chain of commits where we've used single uppercase letters to stand in for the real hash IDs:

... <-F <-G <-H

H is the last commit in this chain. Git can use its hash ID—the real one, the big ugly unique one that's not actually the letter H—to look up the commit's content, getting both the snapshot of all files, and the metadata. The metadata contain one hash ID, which is the actual hash ID of earlier commit G. G is H's parent; H is a child of G.

So, given the hash ID of H, Git can use it to find the hash ID of G, which lets Git fetch both the full snapshot of G, and the hash ID of G's parent, F. That lets Git get at both the full snapshot of F and the hash ID of F's parent. This repeats, all the way back to the very first commit in the repository—which has no parent, because it can't have one—and that is the history in the repository, as seen by starting at H and working backwards.

The key to all of this, though, is that we have to know the actual hash ID of commit H. Where can we get this hash ID? We could write it down: we could say I'd like a branch that ends at commit H and hence write, on paper or a whiteboard or whatever, the actual hash ID of H. But we have a computer. Why not have the computer write down this hash ID?

That's what a branch name is: it's just a place to store one hash ID. A branch name, whether it's master or develop or recipients or whatever, just stores a hash ID. That hash ID is, by definition, the last commit in that chain. Even if there are more commits after it, it's still the last one for that branch.

When you create a new branch name, you pick some existing commit and make a new name for that commit. So if we're on master now, we might have:

...--G--H   <-- master (HEAD)

If we now create a new branch, without specifying some particular commit to use, Git will use the current commit, which is commit H here, and make a new name for it:

...--G--H   <-- master, recipients (HEAD)

The funny (HEAD) annotation here tells us (and Git) which of these names is the current name. The name itself tells us (or at least Git) which commit is the current commit. (Humans tend not to pay any attention to the big ugly hash IDs because they're so big and ugly, and besides, the names do all the work.)

So now you can see what you did

You had some commit—we can't see its hash ID in your question, but it has some unique hash ID—as your current commit as found by your current branch name, which was master. You then used git checkout -b recipients to add a new branch name, without changing anything in either Git's index or your working tree. Both branch names selected the same commit.

Then, you ran git checkout master. This told Git: start using the name master as the current name. Git figured out that the current commit was H (or whatever letter you like, or use the real hash ID, but it's so big and ugly) and that it should instead use ... commit H, because master selected the same commit.

If you had checked out a different commit, which would occur if the other branch name selected some other commit, things might have been different. But you were staying on the same commit. You were just changing which name was to be used to find commit H. So Git didn't need to update its index.

If you had:

          I--J   <-- branch1
         /
...--G--H   <-- master
         \
          K--L   <-- branch2

in your repository, there would be three different names you could use, all of which name different "last commit"s: the name master selects commit H, the name branch1 selects commit J, and the name branch2 selects commit L.

Note that commits up through and including H are on all three branches, even though H is the last commit on master. Commits I-J are only on branch1 and commits K-L are only on branch2.

Git allows you to move your branch names around, at any time. Some name-motions are "more special" than others. When you pick a branch to use (with git checkout), Git has to copy that commit's files into both its own index—so that they're ready to go into a new commit—and your working tree;6 but once that's done, adding a new commit makes the current name point to the new commit. So that's a very common way for a branch to grow: you check it out, then add new commits. The name doesn't move by itself though: you have to actually make a new commit first, or else use one of Git's change the commit selected by a branch name commands.


6This "copy files out of the commit", that updates Git's index and your working tree, generally requires that both the index and your working tree be "clean". The word clean here is not very well defined, and sometimes, you can switch branches, even though they mention different commits, without a clean index / staging-area and/or working tree. For more about this, see Checkout another branch when there are uncommitted changes on the current branch.

Upvotes: 1

Related Questions