Alin Tudor
Alin Tudor

Reputation: 39

Git remove merge commit from history, but retain the commits with which it has been connected

I have the following structure of last 6 commits from history, all in the same branch:

A(merge commit)
|
|\
B \
|  |
|  C
|  |
|  D
|  |
| /
|/
E
|
F

I want to delete A merge commit but I want to keep C and D commits in a linear history before the B commit. I mention that all the commits are pushed to remote. I try to reset --soft HEAD~1 to delete the A merge commit but with that command the other commits, C and D have been also deleted. Also I have a new change in the last merge commit I want to delete and I want to transfer that modification in the B commit because it would be an addition to the same file as in the B commit.

The final history I want to have:

B
|
|
C
|
|
D
|
|
E

Upvotes: 2

Views: 3362

Answers (2)

torek
torek

Reputation: 489748

TL;DR

Use git reset --soft (as you are doing but with a different target, HEAD^2 or the raw hash ID of commit C) and then git commit. You might want an extra option or two with your git commit. See the long answer for more.

(Note, too, that you'll need git push --force as in VonC's answer. I suspect he wrote that answer before you mentioned that you have a fix in commit A too.)

Long

Let's correct a few statements-of-fact that are ... well, they're wrong in a subtle way. They're right in terms of what you see happening.

I try to reset --soft HEAD~1 to delete the A merge commit but with that command the other commits, C and D have been also deleted.

This is not actually the case. The commits have not been deleted. They just become hard to find. The reason for this is straightforward: Git actually works backwards.

Let me re-draw your sequence of commits horizontally, the way I prefer for StackOverflow postings, with older commits towards the left and newer commits towards the right. That gives me this drawing:

...--F--E---B--A   <-- somebranch (HEAD)
         \    /
          D--C

where, based on the result of reset, we see that B is the first parent of A. Running git log at this point will:

  • show commit A; then
  • show commit B because A links back to B; then
  • show commit C because A links back to C; then
  • show commit D because C links back to D; then
  • show commit E because both B and D link back to E

and so on. The precise ordering for showing B, C, and D depends on any commit-sorting options you give to git log: --topo-order forces a sensible graph order, for instance, while --author-date order uses the author date and time stamps. The default is to use the committer date and time stamps, with the most recent commits being seen before less-recent commits.

When we do the reset, we get this. I need to move B up a line because of the way I draw the graph, but A still links back to B and C both:

          B___ <-- somebranch (HEAD)
         /    \
...--F--E      A
         \    /
          D--C

That is, after the git reset --soft HEAD~1, the name somebranch now selects commit B instead of commit A.

Because Git works backwards, we no longer see commits A, C, and D. The git log operation starts with commit B, and shows it; B then links back to E, so git log moves to E and shows it; and E links back to F so we see F, and so on. We never have a chance to move forward to D, C, or A: that's simply impossible, because Git works backwards.

The final history I want to have:

E--D--C--B   <-- somebranch (HEAD)

Now, in fact, commit BB stands in for some big ugly hash ID—connects back to commit E. This will always be the case: no existing commit can ever be changed at all. So this history is not possible. We can, however, make a new commit B' that is a lot like B, but different.

Also I have a new change in the last merge commit I want to delete and I want to transfer that modification in the B commit ...

When we make our new B' commit that is like-B-but-different, we can do that as well.

Sidebar: more about commits and how Git makes them

Every commit in Git has two parts:

  • Each commit has a full snapshot of every file that Git knows about at the time you (or whoever) make the commit. These snapshots store files, but not the same way your computer stores them. Instead, their names and contents are stored as internal Git objects, and these objects are compressed and de-duplicated (and frozen for all time as well). The de-duplication means that if you have some series of commits C1, C2, C3 with thousands of files each, but only one file actually changes in these commits, the thousands of files are all shared. The new commits only have one new file each. Even then, the new data is compressed and Git-ified in various ways that might turn a big file into just a tiny delta (eventually—this happens late in the game, in Git, because you get better deltas that way).

  • Each commit also stores some metadata, or information about the commit itself. This includes the author and committer information: who made the commit, and when. It includes a log message: you get to write this yourself, if you're making the commit. And—all important for Git's own purposes—a commit includes the raw hash ID, those big ugly strings like 225365fb5195e804274ab569ac3cc4919451dc7f, for each of the commit's parents. For most commits, that's just the one earlier commit; for merge commits like your commit A, that's a list of two commit hash IDs (for B and C, in that order).

The metadata in a new commit comes from your user.name and user.email settings—since that's where your name and email address are—and from information Git can find right now, e.g., the current date and time as stored in your computer's clock. (If the clock is wrong, the date-and-time-stamps on the commit will be wrong too. No big deal, they're just used to confuse humans. 😀) The parent of the new commit is ... the current commit, as pointed-to by the current branch name.

So, if we want new commit B' to point back to existing commit C, we need commit C—not commit B, and not commit E—to be the current commit. To make that happen, we need to make the name somebranch point to commit C.

There are a lot of ways to move branch names around in Git, but the one we'll use here is git reset. The git reset command is big and complicated, and one of the complications is that it can reset Git's index. So let's mention the index.

The index—which Git also calls the staging area, referring to how you use it, and sometimes also calls the cache, though these days that's mostly in flags like --cached, as in git rm --cached or git diff --cached—is where Git gets the files to put into a new commit. In other words, the index holds the proposed snapshot for the new commit. When you make a new commit, that new commit will have both metadata and a snapshot, and the snapshot comes from Git's index.

When we describe the index as a staging area, we talk about how we change working tree files and then use git add to copy them into the staging area. This isn't wrong, but this picture is incomplete: it suggests that the staging area starts out empty, and gradually fills up. But in fact, it starts out full of files. It's just that the files it is full of, are the same files as in the commit and in the working tree.

When you run git status and it says, e.g.:

On branch master
Your branch is up to date with 'origin/master'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   Makefile

this doesn't mean that only Makefile is going into the next snapshot. In fact, every file is going into the next snapshot. But the Makefile in Git's index / staging-area right now is different from the Makefile in the HEAD commit right now.

If I run git diff --cached (or git diff --staged, which is exactly the same thing) right now, I get this:

diff --git a/Makefile b/Makefile
index 9b1bde2e0e..5d0b1b5f31 100644
--- a/Makefile
+++ b/Makefile
@@ -1,3 +1,4 @@
+foo
 # The default target of this Makefile is...
 all::
 

I put some bogus stuff at the front of Makefile and ran git add Makefile to get here, and that means that I had Git kick the existing HEAD-commit copy of Makefile out of the index, and put in the existing working-tree copy of Makefile instead. That's where the line foo came from.

If I use git restore --staged Makefile, as Git suggests here, that copies HEAD:Makefile to :Makefile. The colon-prefix syntax here is specific to certain Git operations (like git show for instance) and allows you to read the copies of files inside Git. The copy of Makefile in my working tree isn't inside Git, so there is no special syntax for that: it's just a plain ordinary file. But there is a special syntax, with this colon stuff, for some Git commands. Use, e.g., git show HEAD:Makefile to see the committed copy, and git show :Makefile to see the index copy.

In any case, I now follow Git's advice:

$ git restore --staged Makefile
$ git status
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   Makefile

no changes added to commit (use "git add" and/or "git commit -a")

The git restore --staged that I ran copied the HEAD copy of Makefile into the index / staging-area. So now that these two are the same, git status does not say anything about them being staged for commit. But now the index Makefile and my working tree Makefile are different, so now git status says that these two are different.

About git reset

The git restore command I'm using here is new-ish, having been introduced in Git 2.23. The git reset command is much older. It is a big and complicated command, so we'll only look at a subset of the ways we can use it.

When used as:

git reset --soft HEAD~1

for instance, this kind of git reset moves the current branch name. That is, we take a drawing like this:

          B___
         /    \
...--F--E      A   <-- somebranch (HEAD)
         \    /
          D--C

and move somebranch so that it points to B, like this:

          B___ <-- somebranch (HEAD)
         /    \
...--F--E      A
         \    /
          D--C

No commit changes. No commit can change.

If we were to use git reset --mixed, we'd have Git move the branch name and change out all the copies of files that are in Git's index. If we were to use git reset --hard, we'd have Git move the branch name, change out the copies of files in Git's index, and replace the copies of files in our working tree. So this particular kind of git reset does up to three things:

  1. Move our HEAD around. Using the argument we gave, and the rules from git rev-parse / gitrevisions, find some commit. Whatever branch name we're on—if git status says on branch somebranch, that's somebranch—make that name point to that commit's hash ID.

    If --soft, stop! Otherwise, go on to...

  2. Replace all the files that are in Git's index. The replacement files come from the commit we picked in step 1.

    If --mixed or no option, stop! Otherwise (--hard), go on to...

  3. Replace working tree files the same way that index files got replaced in step 2.

If you've followed through all of this, you can see that git reset --mixed and git reset --hard can, if we pick the current commit as the new commit, just reset the index, or reset the index and replace the working-tree files. And if we don't give git reset a particular commit hash ID or name or relative instruction like HEAD~1 or HEAD^2, git reset uses HEAD. So git reset --soft HEAD or git reset --soft is just a way to do nothing at all, but git reset HEAD or git reset is a way to clear out Git's index, making it match HEAD again. (You don't want to do this—I'm just noting it here so that you can get a proper mental model of what git reset does.)

About git commit

When you run git commit, Git:

  • collects any necessary metadata, including the log message;
  • adds the appropriate parent commit hash ID(s): usually just that for HEAD, but if you're committing a merge, HEAD plus more;
  • packages up whatever is in Git's index as the new snapshot;
  • writes all of that out as a new commit, which gets a new, unique hash ID; and
  • writes the new hash ID into the branch name.

That last step is how we got from:

...--F   <-- somebranch (HEAD)

to:

...--F--E   <-- somebranch (HEAD)

way back when, for instance. You did a git checkout somebranch or git switch somebranch. That:

  • picked commit F, because somebranch pointed to commit F;
  • filled in Git's index from the commit;
  • filled in your working tree from the commit (as now represented in Git's index); and
  • attached the name HEAD to the name somebranch, so that Git knows that a future commit should write into somebranch.

Then you modified some files and ran git add. This copied any updated files into Git's index, ready to be committed. The index continued to hold the proposed next commit (or snapshot part), with git add changing the proposed snapshot, by ejecting some of the current index files and putting new (updated) files in, instead. It's actually the git add step that does all the Git-ifying of files, making them ready to be committed.

Finally, you ran git commit. This packaged up the index copies of all files, to make the new snapshot. It added the right metadata. It made the commit, which got Git the hash ID for commit E. (This also put commit E into Git's database of all-the-commits-and-other-objects.) Last, it wrote E's hash ID into the name somebranch, and now you had:

...--F--E   <-- somebranch (HEAD)

with the current commit and Git's index matching again. If you git add-ed all your updated files, the commit, the index, and your working tree all match. If you only git add-ed selected files, you still have some working tree files that don't match the commit, and you can git add them and make another commit.

Where you are now

Meanwhile, we're now in this state:

          B___
         /    \
...--F--E      A   <-- somebranch (HEAD)
         \    /
          D--C

Commit B is, in some sense, bad. You don't want commit B. It is going to stick around for quite a while—at least 30 days from when you made it—even after we set things up so that you can't see commit B, but that's OK, Git will eventually purge it when it's been sitting around too long unused.

This means commit A is bad too, because commit A permanently links back to commit B. (A links back to C too, but C is OK.) No part of any existing commit can ever be changed, so to abandon B, we have to abandon A too.

So: let's use git reset to move somebranch, so that somebranch locates commit C. We could use any of the three reset options here, but one of those options makes things easy:

  • If we use git reset --soft, the index remains unchanged. Git's index currently matches the snapshot in merge commit A. This is the snapshot you said you want to keep.

  • If we use --mixed or --hard, Git will empty out its index and fill it from commit C. That's not terrible—the files we want are still there in commit A—but it's clearly not as useful.

So let's run git reset --soft hash-of-C. Or, because the current commit is commit A, we can use HEAD^2. If we look at the gitrevisions documentation, we find that HEAD^2 means the second parent of the current commit. That will be commit C. Note that we need to have commit A out right now to have the right stuff in Git's index, so if we're not on commit A at this point, we had better check it out first.

The end result is this:

          B___
         /    \
...--F--E      A
         \    /
          D--C   <-- somebranch (HEAD)

Once we have this, we're ready to run git commit. Git will use whatever is in Git's index—which, thanks to --soft and our previous position at A, is the set of files from commit A—to make the new commit. We'll call the new commit B'; let's draw it in:

          B___
         /    \
...--F--E      A
         \    /
          D--C--B'  <-- somebranch (HEAD)

Commit A cannot be seen. There's no name (branch name) by which to find it. We can run git log and give it A's raw hash ID, and that will find commit A, but we can't see it otherwise. So let's update our drawing as if there is no commit A. Since A is the only way to find B, let's leave B out as well:

...--F--E--D--C--B'  <-- somebranch (HEAD)

So our final sequence of commands is:

git checkout somebranch                # if necessary
git log --decorate --oneline --graph   # make sure everything is as expected


git reset --soft HEAD^2
git commit

A note about HEAD^2: beware of DOS/Windows CLIs that eat ^ characters. You may have to use HEAD^^2, or quotes, or something, to protect the ^.

One last refinement

When you run git commit, Git will need a log message. If the log message in existing commit B is good and you want to re-use it, you can tell Git to do that. The git commit command has a -c or -C option. Running:

git commit -C <hash-of-B>

will grab the commit message from commit B and use that. You won't be tossed into your editor to come up with a commit message.

If the commit message in B could be improved, you might want to be tossed into your editor. To do that, add --edit, or change the uppercase -C into a lowercase -c:

git commit --edit -C <hash-of-B>

or:

git commit -c <hash-of-B>

Note that after git reset, it becomes hard to find the hash of B, so you might want to save it. There is a trick with Git's reflogs to get it, though: somebranch@{1} is the old value of somebranch before the reset, so:

git commit -c somebranch@{1}~1

will work. I generally find it easier, though, to use git log and then cut and paste the raw hash IDs with the mouse, than to type in complicated name@{number}~number^number expressions.

Upvotes: 4

VonC
VonC

Reputation: 1328712

If you don't have any work in progress, I would simply do:

git switch mybranch # commit A
git reset --hard C
git cherry-pick B

That way, you are recreating B on top of the new 'mybranch' HEAD C.
A git push --force will be needed after that (if the branch was previously pushed), so, if you are not alone working on that branch, make sure to notify your colleagues.

Upvotes: 2

Related Questions