Reputation: 8374

Understanding `git reset --hard`

Say I have a Git repo with the following commits to the master, in order: A, B, C, D. I want to roll back the master to the state that it was in following commit A; in other words, discard the changes from B, C and D. I'm pretty sure that git reset --hard would do that. But then, I would like to selectively re-apply some of the discarded patches (git cherry-pick is what I want for that, right?) So my specific questions are:

Does git reset --hard remove anything from the commit history? If I reset the master to A, will B, C and D still be hanging around in the repo?
Does git cherry-pick allow me to do what I described above, or did I misunderstand it?

Upvotes: 4

Answers (3)

Philippe

Reputation: 31227

Just for information, because that's not your question, you'd better use git rebase -i than git reset --hard.

Then, during the rebase, you just have to delete the lines of the commits you don't want anymore.

Because after your reset, chances are that you won't see commit anymore and except if you wrote somewhere the sha1, you will have difficulties to cherrypick them (except if you have a look to the reflog).

Upvotes: 0

Ben

Reputation: 1377

To quickly answer your questions:

Does git reset --hard remove anything from the commit history? If I reset the master to A, will B, C and D still be hanging around in the repo?

git reset --hard does not delete anything from your local repository. It moves your branch pointer around in preparation for your next commit. The commits that are not pointed to will be eventually deleted, but not immediately. You can read more about that topic in the docs for git gc

For example, after a git reset --hard A, you can immediately recover your "lost" commits with the command: git merge --ff-only D.

Personally, before I do a git reset --hard, I like to mark the 'current' commit with a tag: git tag here just so that after I get done playing with my history, I can easily determine if I have made the desired side effects by executing git diff here..HEAD

Does git cherry-pick allow me to do what I described above, or did I misunderstand it?

git cherry-pick does indeed do what you describe (selectively apply the patches)

Upvotes: 3

torek

Reputation: 490048

To understand git reset properly you need all these bits of information:

The commits themselves exist, in a sense, outside of any branch names.

When you make a commit, Git assigns it a unique hash ID. The new commit you make stores inside it the hash ID of whatever commit was your current commit at the time you made it. We can use these hash IDs to chain the commits together:
```
A <-B <-C <-D
```
We say that each commit points to the previous commit. (Since there was no commit before A, it doesn't point anywhere. If there is one before A, just imagine the chain going back further. It eventually has to end, since no Git repository has an infinite number of commits, and the graph is constrained.)
However, branch names, like master, preserve commits. If there is no name for a commit like D above, D is in danger of being cleaned up and removed by Git's garbage collector, as it seems to be useless. So we add an external name to point to D:
```
A <-B <-C <-D   <-- master
```
Now Git knows that D is in use. Since D points to C, Git knows that C is in use, and so on down through the history.
The special name HEAD usually contains the name of a branch. The branch name itself, such as master, has its usual role of identifying some specific commit (D) and thereby keeping D alive. The name HEAD serves to tell Git which branch-name is to be treated as the current branch.
When you make a new commit with git commit, Git uses the contents of the index to make the new commit. The index, also called the staging area and sometimes the cache, sits "between" the current (HEAD) commit and the work-tree. Hence every file for the current commit has (up to) three versions: the one in HEAD, the one in the index, and the one in the work-tree.

You can copy files back and forth between index and work-tree, and you can copy files out of any commit into the index; but commits are read-only, so you cannot copy from the index into an existing commit. You can only make a new commit from the index.
The work-tree, of course, holds your files in the normal readable/writable fashion, rather than some special Gitty format (as used in the commits themselves and in the index).

What git reset does (in normal modes, --soft, --mixed, and --hard) is to do up to three jobs:

Change something (usually the current branch's stored hash ID) via HEAD. It always does this, but if you use HEAD as the new value, the new value is the same as the old value, so nothing actually changes. (Stop here if --soft.)
Re-set the index. This part is optional: it only happens for --mixed and --hard. (Stop here if --mixed.) Resetting means copy everything from the (now re-set) HEAD into the index.
Re-set the work-tree. This part is optional: it only happens for --hard. Resetting means copy everything from the (now re-set) index into the work-tree.

Now, you mention you want to roll things back to the state they had during commit A. Defining things correctly is the problem here. We can make the branch name point to commit A:

A   <-- master (HEAD)
 \
  B--C--D

which is done with the first action, which always happens: git reset <hash of A> makes the current branch—presumably master—point to commit A, even if you use --soft. Using --mixed or --hard will also re-set the index, or both index and work-tree.

This immediately un-protects B, C, and D, though. So you should first protect them by adding a name (branch or tag) to remember D, which will protect it. D will then protect C, which will protect B.

Meanwhile, what you have done here is to make a branch name "move backwards". There's nothing inherently wrong with this, but other people and processes may not expect that to happen. Normally branch names only "move forwards" (we add new commits and make the branch name point to the newest one, which lets us continue accessing the still-protected older commits). So this may not be the right way to do this. (If everyone else who is using this branch name agrees that it moves this way, it's fine. If not, it's not.)

You mention git cherry-pick. What git cherry-pick does is turn a commit into a change (commits themselves are full snapshots, saving whatever was in the index when you ran git commit). It then tries to apply the change wherever you are now. Suppose, for instance, that we do exactly the above git reset --hard, after making a new name save point to commit D:

A   <-- master (HEAD)
 \
  B--C--D   <-- save

You can now run git cherry-pick <hash-of-C> or git cherry-pick save~1 (both of these will identify commit C). Git will then compare the contents of commit C to the contents of commit B. Whatever changed, Git will attempt to make those changes now to the contents of your index and work-tree. If all that succeeds, Git will commit the result:

A--C'   <-- master (HEAD)
 \
  B--C--D   <-- save

Here, I call the new commit C' because it resembles C a lot: it makes the same changes as C (but to a different base!), and has the same commit message as C (usually with a "cherry-picked from ..." annotation added).

When you are done cherry-picking and have no use at all for commits B through D, you can simply delete the name that is keeping them around and easy to find. At this point, those three commits really do (well, maybe¹) become eligible to be taken out with the trash when git gc runs.

¹Git tries really hard not to lose commits. As a result, there are many ways that a commit won't be collected quickly, including "reflogs" and age. A commit that is under 14 days old is never pruned by default; a commit that is in a reflog entry is also not pruned; and reflog entries themselves usually stick around for at least 30 days. Deleting the save name throws out the reflog for save itself, but the reflogs for HEAD and master are likely to retain the commits for some time.

Upvotes: 16

Understanding `git reset --hard`

Answers (3)

Related Questions