crafter
crafter

Reputation: 74

What is the difference between commit --amend and reset --soft

The idea of my question is researching...
What is the essential difference between commit --amend and reset --soft

Investigation process in steps:

vim index.js > edit > save
git add index.js
git commit -m '...'
git push origin

Now I need to rewrite the commit I've sent previously. For that matter I should use:

vim index.js > edit > save
git add index.js
git commit --amend --no-edit
git push --force origin

Essentially I've got another one SHA1 and for the sake of analogy three objects in .git/objects directory, but the git log shows me two SHA1 and I totally agree with it because the commit has been amended.

Let's go back a bit. Instead of git commit --amend I've executed the git reset --soft HEAD~, the HEAD pointer stands for stage version of the file, write some code and execute:

git add index.js > git commit -m '...' > git push --force origin

.git/objects directory contains one more SHA1, but the history has been amended with the new one.

So the way I'm telling that the commands git commit --amend and git reset --soft have the same behavior

Or I'm not right?

Upvotes: 3

Views: 1744

Answers (2)

torek
torek

Reputation: 487735

What is the essential difference between commit --amend and reset --soft

One does a commit, the other does a reset? 😀

This is a deliberate sort of joke answer, but it's actually the right one as well. They are simply different operations.

To understand all of this properly, you need to understand Git's index, how Git actually makes commits, and the way branch names work. It helps to start with a clear definition of what a commit is and does.

Git is all about commits; branch names just point to one commit, each

People new to Git often think Git is about files, or branches. It's definitely not about files (though it does store them), and the word branch is ambiguous and Git isn't really about the meaning most Git newcomers use either. Git is really all about commits. The commit is the fundamental user-oriented unit of storage in Git.1

Each commit stores a snapshot—a full and complete copy of all of your files—plus some metadata, which provides information about the commit: who made it and when, for instance. Each commit gets a unique hash ID, reserved forevermore to mean that commit. Every Git repository everywhere agrees that that hash ID means that commit, and either a repository has that commit and therefore has that hash ID, or it doesn't, and doesn't. Once made, no commit—in fact, no internal Git object at all—can ever change either. So the snapshot is frozen forever in time.

Every commit can contain, in its metadata, the hash ID(s) of some earlier, pre-existing commit(s). These are the commit's parent commits. Most commits store just one parent hash ID. These linkages, from child backwards to parent, form chains:

... <-F <-G <-H ...

If we're at commit H, we can read out its parent hash ID G. This allows Git to look up commit G; G contains the hash ID of its parent, F. This allows Git to look up commit F, which contains its parent hash ID. By repeating this process, Git can work all the way from the last commit back to the first one.2

This means we only need to remember the last commits. There might be more than one "last commit", as in this case:

...--F--G--H   <-- master
         \
          I--J   <-- develop

Note that H is the last commit on master, while J is the last commit on develop. The branch names select these tip commits. From here, Git can work backwards. Note that commit G is on both branches; this is perhaps clearer if we draw H on a row by itself:

          H   <-- master
         /
...--F--G
         \
          I--J   <-- develop

(these two drawings represent the same repository).

When there is more than one branch like this, we need a way to know which one we've checked out with git checkout branch or the newfangled git switch branch. To keep track, we can draw the diagram with the special name HEAD, written in all uppercase like this, attached to one of the branch names:

...--F--G--H   <-- master (HEAD)
         \
          I--J   <-- develop

This drawing means we are on branch master and have commit H checked out.

...--F--G--H   <-- master
         \
          I--J   <-- develop (HEAD)

This drawing means we are on branch develop and have commit J checked out.


1Commits can be broken down into component parts—tree objects, blob objects, and the underlying commit objects that each refer to a tree object—but this level is not where users work with Git.

2Some commits—ones that Git calls merge commits—contain two or more hash IDs. From such a commit, Git works backwards to both (or all) parents, introducing a fork in history. Note how a merge, which brings things together, acts as a diverging point because Git works backwards. Where branches diverge, as Git works backwards, this backwards traversal brings them back together.

At least one commit in every repository has no parent, because it was the first commit ever, and could not point backwards. A commit with no parent is a root commit.


We make new commits from the index

As noted above, commits are frozen for all time: once we make a commit, we can never change it. No part of any commit can ever change.3 This includes all the files stored inside each commit, in the snapshots. They're not only frozen, they're also stored in a special, read-only, Git-only, compressed format: no other programs on your computer can even read the files.

What this means is that Git must extract the files from a commit to somewhere that they become useful. That somewhere is your work area, which Git calls your working tree or work-tree. Here, your files have their everyday form: they're not frozen, nor compressed; every computer program can use them. You can do whatever you like with your work-tree: it is yours to work with, after all.

Git could make new commits from your work-tree. Other version control systems do this. But Git doesn't. Instead, somewhere between the current commit—the one you checked out, from the branch you checked out, which Git finds with the special name HEAD, as we drew above—and the work-tree, Git stores all of your files in a special area that Git calls, variously, the index, or the staging area, or—rarely these days—the cache.

These three names all refer to the same thing. The index or staging area—I'll call it index here—holds copies of all of your committed files, at least initially. They're in the frozen format, like they are in a commit,4 but unlike a commit, they're not actually frozen: you can overwrite them.

So, every file has three active versions: the HEAD copy, frozen in the current commit; the index copy, which you can replace at any time; and the work-tree copy, which you can see and work on / with. You edit the work-tree copy, then you run git add file. You have to run git add all the time, and the reason is now clear: each git add you run copies the file, from the work-tree—where it has the everyday form your computer uses—to the index / staging-area, where it's in the frozen form Git likes.

Now we can see what git commit does, and why it's so relatively fast.5 All git commit has to do is package up what's already in the index, in the right format, into a new commit. Well, first it has to gather a log message, and add your name and the current date-and-time and all that kind of stuff; and it has to set the new commit's parent hash ID, in the commit's metadata. Then it can make the commit, using the pre-frozen files from the index.

The parent of the new commit is the current commit (except, as we'll see, for --amend). The new commit—say, K—gets written into the collection of all commits, and it points back to the current commit:

...--F--G--H   <-- master
         \
          I--J   <-- develop (HEAD)
              \
               K

and now the magic bit happens: Git writes the new commit's hash ID into the name to which HEAD is attached. In this case, that's develop, so now we have:

...--F--G--H   <-- master
         \
          I--J--K   <-- develop (HEAD)

and K is the latest commit on develop.


3Note that even git commit --amend doesn't change a commit! We will get to what it does do in a moment, but here is a hint. If you do take out a commit, change something, and use that to make a new commit, you get a different commit with a different hash ID. It does not matter what you change (except that different changes result in different hash IDs): any different commit will, by definition in Git, have a different hash ID. Only if you keep every last bit the same—the same snapshot, the same author, the same log message, and the same date-and-time stamp—will you get the original hash ID back. But then you haven't made a new commit: you made the old commit again, with the same parent, same snapshot, same log message, and even the same timestamps. You made the original commit yesterday, and you just made the new one yesterday again—it's the same commit!

4Technically, the index holds references to the frozen copies: it just holds a blob hash ID, plus the name of the file, plus a bunch of cached information about the work-tree (hence the name cache). The difference shows up if and when you start poking around with git ls-files --stage and git update-index and the like, to look at or change what's in the index. Except for these cases, though, you can just think of the index as holding a copy of each file.

5If you've ever used some of the other pre-Git version control systems, you might remember how you can enter a "commit" or "checkout" type of command and go out to lunch because it's going to take many seconds or minutes to work. These days, some people think Git is slow: they don't know slow. 😀


What git commit --amend does

All git commit --amend really does is:

  • write the new commit almost as usual, except
  • instead of using the current commit as the parent of the new commit, use the current commit's parent(s) as the parent(s) of the new commit.

On top of this, it defaults to letting us edit the current commit's commit message, while making the new commit.

So suppose we have:

...--F--G--H   <-- master
         \
          I--J--K   <-- develop (HEAD)

and you realize you forgot to git add a file, or you want to fix up the commit message. You do your forgotten git add if needed, then run:

git commit --amend

Git goes to collect the commit message, but this time it opens the editor on a file holding commit K's commit message. You can edit this if necessary, write it out, and exit the editor, and git commit makes the new commit—but instead of setting its parent to K, it sets its parent to K's parent(s), which means J. This makes a new commit we can call either L or K'; let's use K'. As its last step, git commit writes K''s hash ID into the current branch name:

...--F--G--H   <-- master
         \
          I--J--K'  <-- develop (HEAD)
              \
               K   [abandoned]

Note that commit K still exists, in the repository. There's just no branch name by which we can find commit K.6 The name develop now points to commit K' instead.

So, git commit --amend appears to change a commit. But all it really does is shove the commit aside, putting a new-and-improved (well, presumably) replacement in its stead.


6We can find K's hash ID in the reflogs: the reflog for develop has it, at develop@{1}, at the moment, and the reflog for HEAD has it, at HEAD@{1}, at the moment. Most Git commands don't look at the reflogs, though—and reflogs are optional, for that matter. The reflog entries eventually expire, and once they are gone, commit K becomes unprotected from Git's Grim Collector, git gc, which garbage collects abandoned and unprotected commits and other lost Git objects.

What this means in the end is that usually, you can get back lost commits for at least 30 days, as that's the default minimum reflog entry keep-time. It's git gc that normally handles all of this—including expiring old reflog entries—and Git runs git gc automatically, occasionally, if and when Git thinks it might be good to do.


git reset moves branch names, and optionally resets the index

The git reset command is considerably more complicated than git commit --amend, mostly because too many separate actions are all stuffed inside the one git reset command. If we ignore most of them, though, and concentrate on git reset's most fundamental mode of operation, what git reset does is to do up to three things:

  1. First, it moves the current branch name. You pick a commit—any commit in your repository, anywhere in the graph—and tell git reset that you want your current branch name, the one HEAD is attached-to, to point to that commit. git reset makes that happen.

  2. Then, if you said --soft, git reset stops. Otherwise, it goes on to load the index from the commit you just told it to move to.

  3. Then, if you said --mixed—or didn't say any of these—git reset stops. Otherwise, it goes on to make your work-tree match the update it made to the index.

So, if we look at this graph:

...--F--G--H   <-- master
         \
          I--J--K   <-- develop (HEAD)

and run git reset --soft HEAD~1, the commit we selected was J: HEAD~1 means find the commit that HEAD selects (which is K) and step back one, which lands at J. So step 1 of git reset means move develop to J, which gives us this:

...--F--G--H   <-- master
         \
          I--J   <-- develop (HEAD)
              \
               K   [abandoned]

Note how this looks very similar to what we got from git commit --amend, except there's no commit K' here.

We told git reset to reset with --soft, so at step 2, which would reset the index, it just quits instead. The index is left alone. Our work-tree is left alone. If the index matched commit K a moment ago—and it probably did—then it still matches commit K. If our work-tree matched commit K, it still does. (If not, the work-tree doesn't really matter right now.)

If we now run git commit, Git will collect a log message as usual, package up whatever is in the index—which probably still matches K—and make a new commit. Let's call that commit K', and draw it in:

...--F--G--H   <-- master
         \
          I--J--K'  <-- develop (HEAD)
              \
               K   [abandoned]

So in the end, we got here the same thing we would have gotten by git commit --amend: a new commit K' (with whatever hash ID it has) whose parent is J and whose contents are whatever was in the index.

So --amend and reset-and-commit are the same ... except when they're not

The --amend version is simpler: we run one command. It also lets us amend a merge commit. Suppose, for instance, we have this:

          I--J
         /    \
...--G--H      M   <-- branch (HEAD)
         \    /
          K--L

We can use git commit --amend to shove M aside and make a new commit M', using the index's contents (probably the same as M's snapshot) and a new log message. When we do, we get M' with parents J and L: two parents, i.e., a merge commit. Without --amend it's kind of hard to get a merge commit,7 and git reset --soft and another commit won't do it.

By the same token, though, git commit --amend will only look back one commit. Using git reset --soft, we can make a bunch of commits "go away". Suppose, for instance, we have this:

...--o--*--o   <-- master
         \
          A--B--C--D--E--F--G--H--I   <-- feature (HEAD)

where the whole long chain of A through I were a bunch of experiments. The feature now works and you'd like to have one commit AI that does it all.

There are multiple ways to achieve this,8 but if you've just made commit I, so that your index and work-tree match commit I, you can now git reset --soft HEAD~9 the name feature so that it points to commit *. Then you can git commit, using the current index—the snapshot from I—to make a new commit AI:

          AI   <-- feature (HEAD)
         /
...--o--*--o   <-- master
         \
          A--B--C--D--E--F--G--H--I   [abandoned]

Commits A through I remain in your repository, findable only through the feature and HEAD reflogs, for another 30+ days, if you need them back; but now git log master..feature shows just the one commit AI. The snapshot in AI matches the snapshot in I but it looks like you did everything all in one amazing commit.


7Git being the tool-set that it is, there are several ways to make a new commit "be" a merge. The most straightforward is to drop below the level of git commit itself, into the component parts that make a new commit, but you can also create a .git/MERGE_HEAD file with hash IDs in it. None of this is designed for everyday, casual use, though.

8The usual one is to use git merge --squash, which lets you make AI come after the end of master, but generally uses a new branch name:

             AI   <-- completed-feature (HEAD)
            /
...--o--*--o   <-- master
         \
          A--B--C--D--E--F--G--H--I   <-- feature

Because branch names don't really matter—it's commit hash IDs that matter; branch names just record them for you—it is possible to do all this without using a second branch name. But it's usually not wise to abandon a commit and have to hunt it down in the reflogs; it's too easy to goof things up this way. If you start looking in your reflogs, you'll generally find many twisty little passages, all alike and it can be very tricky to find the right ones.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520878

Both commands can be used to achieve a similar end goal, but they don't do the same thing. git commit --amend makes a new commit with whatever be in the stage, and that new commit replaces whatever commit were the previous HEAD of your branch.

On the other hand, git reset --soft moves the HEAD pointer of the branch back one commit, but leaves both the stage and working directory intact. This has the effect that your branch now appears to be on the previous commit, but all the work you did to generate the old HEAD commit now appears as being staged. If you now change things in the working directory, stage those changes, and commit, you would also be making a new commit to replace the old HEAD, which is a similar result to git commit --amend.

One advantage which git reset --soft (which is identical to saying git reset --soft HEAD~1) has over git commit --amend, is that the former can reset across multiple commits. This can be useful if you want to rewrite say the latest 4 commits in your branch. On the other hand, git commit --amend is a one horse show, and only works on the HEAD commit.

Upvotes: 7

Related Questions