Reputation: 449
My Git repository looks like below:
I have created 2 branches- Branch_1 & Branch_2. Now finally I am ready to merge this Branch_2 into Master branch. But when I did merge it showed all the commits for Branch_1 & Branch_2 because of multiple merge in between. Can anyone suggest how to proceed in this case to have a single commit before merging my code to master branch?
git log --oneline --graph --color --all --decorate
* 36dbb26 (origin/Branch_2) changed abc
* 1a7bf25 changed T
* 110095a changed Z
* 1087d5d Merge remote-tracking branch 'origin/Branch_1' into Branch_2
|\
| * 8c9d02a (origin/Branch_1) sleep added between each processing to discover partitions
| * ca401cb changed S
| * 20a4edd changed R
* 3f472ef install package
* 1087d5d Merge remote-tracking branch 'origin/Branch_1' into Branch_2
|\
| * 8c9d02a (origin/Branch_1) adding y
| * ca401cb changed g
| * 97c326d changed f
* | fd543bf changed c
* | 7b24330 (HEAD -> master, origin/master, origin/HEAD) fix D
* | 53aecb4 adding x
|/
* 49d3bda changed e
| * 213ea18 (origin/Feature_branch) changed d
| * 0b3b675 changed c
|/
* df6ac90 Adding c
* 96699ff Adding b
* 99f165f Adding a
I want end result like below:(all the commits from fd543bf to be merged into 1 commit)
* 36dbb26 (HEAD -> master, origin/Branch_2) changed R-All consolidated
* 7b24330 (origin/master) Fix D
Upvotes: 0
Views: 1947
Reputation: 51850
To squash everything in one commit : call git reset --soft
followed by git commit
:
# from Branch_2 :
git reset --soft master
git commit
Upvotes: 2
Reputation: 488163
You probably just want git log --first-parent
.
... But when I did merge it showed all the commits for Branch_1 & Branch_2 because of multiple merge in between.
No, that's not why. The reason you see all these commits is because you do in fact have all these commits.
The thing to understand here is that, in the end, Git is all about commits. Commits are the unit of storage in Git.1 Commits are what you have, and they are what you want. If you don't want these commits, what you want must be some other commits. Commits are all you get, so you'd better want commits. (If you want something else, don't use Git. But many other version control systems are commit-oriented as well, so you may find that you still get commits anyway, so you might as well stick with Git, unless ... well, read the next paragraph.)
Branch names, in Git, exist for one primary reason: to find commits. This is where Git differs from other version control systems. In many version control systems, a branch is a container for commits, and you can inspect commits by inspecting branches: the set of commits contained in the branch is the set of commits you'll see, if that's how you ask. But that's not how branch names work in Git.
In Git, a commit can be—and often is—on many, or perhaps even all, branches at the same time. That's because Git's branch names are not containers. They do not hold commits. They simply let you find commits. Each name finds one commit. It's the commits themselves that find the rest of the commits.
Each Git commit is made up of two parts, which we'll describe later. Each commit is found by its unique hash ID. Each commit has one of these hash IDs; that hash ID is the "true name", as it were, of the commit. Without the hash ID, Git literally can't find the commit.2 So a branch name holds one hash ID, which is, by definition, the last commit that is contained within that branch. That commit in turn holds some set of hash IDs—usually just one—of earlier commits that are also part of the branch.
When we have a branch name, like main
or feature
, that holds some hash ID, we say that the branch name points to the last, or tip, commit of the branch:
<-H <--feature
But commit H
—H
here stands in for the real hash ID, whatever it is—has the hash ID of some earlier commit G
. So we say that H
points to G
:
<-G <-H <--feature
But commit G
also points backwards to a still-earlier commit:
... <-F <-G <-H <-- feature
and so on, all the way back to the very first commit ever. This one literally can't point backwards to an earlier commit, so it just doesn't, and that's where Git stops working backwards.
So, this is what it means for a commit to be on a branch: we start with the branch name, which automatically determines the last commit on that branch, and then work backwards. But if that's the case ... well, suppose we have something like this, where commit I
points back to H
, and commit K
also points back to H
:
I--J <-- br1
/
...--G--H
\
K--L <-- br2
Which branch holds commit H
?
Git's answer is that commit H
is now on both branches at the same time. So are all earlier commits. Furthermore, even if H
is the last commit on some third branch:
I--J <-- br1
/
...--G--H <-- main
\
K--L <-- br2
this is still the case. Commit H
is now on all three branches.
Hence, in Git, the set of branches that contain some commit is dynamic and fluid. What matters is not the branch names, but the connections from commit to commit. The branch names are useful, but only to get you started. Everything else is all about the commits.
1Because commits are made up of smaller parts, it is possible to work at a lower level. But this is roughly analogous to breaking molecules, like salt, into atoms—sodium metal and chlorine—or even subatomic particles like protons, neutrons, and electrons. Once you break them up like this, they're not useful any more, not in the way that the salt is anyway. You can't season your food with sodium metal, nor with chlorine, and especially not with neutrons.
2There are some maintenance commands—git fsck
and git gc
in particular—that simply look over every commit in the repository and figure out which ones connect to which other ones and so on. This is very slow, so it's not the way you use Git in day-to-day operation. In a bigger repository like the Linux kernel, a git checkout
or git log
will take up to a few seconds sometimes, but a git fsck
or git gc
could take many minutes. Some of this depends on the speed of your computer and its file systems and so on, but the contrast is pretty clear: finding a commit by hash ID is fast, but finding it any other way is usually excruciatingly slow.
We mentioned above that each commit has two parts. These are:
the main data, a snapshot. Here, Git saves, for all time,3 a read-only snapshot of every file's name and content as of the time you, or whoever, made the commit. This allows you—or anyone else—to get back all those files as of that snapshot.
the metadata. Here, Git saves the name and email address of the person who made the commit. Git saves a date-and-time-stamp for when they made the commit. (Git actually has two name-and-address-and-time fields per commit, here, though most people normally only look at one.) Git lets you add a description—a log message—explaining why you made this commit, if you like. And, key for Git itself, this is also where Git stores those earlier-commit hash IDs. Git keeps a list of such hash IDs. Most commits just have that one entry, which tells Git what the parent of the commit is.
It's the parent in the metadata that lets Git show you a commit—which is a snapshot, not a set of changes—as a set of changes. If we have two commits in a row:
... <-F <-G ...
and we take the snapshots out of both F
(the parent) and G
(the child) and compare them, whatever is the same is not changed, and whatever isn't the same ... well, comparing those will tell you what changed. So that's what Git shows: the changes. But to get those changes, Git needed two commits, to get the two snapshots.
3While no part of any commit can ever change, not all commits have to last forever, so saying for all time is an overstatement. Given a commit's hash ID, if Git can find that commit, that commit is that commit. It's not any other commit. It must be the commit that had that hash ID the last time you looked. In other words, the commit is still there, so it's unchanged, and its files are still the same way they were then.
You can, however, get Git to delete a commit. It's not easy: Git is built to add new commits while keeping existing commits, and most of the normal everyday commands you use work this way. But you can, with some effort, make some commits hard to find. Once you do this, and leave them un-find-able (except by maintenance commands) long enough, Git will eventually decide that they must be unwanted trash, and throw them out for real. The git gc
maintenance command in particular does this. Once that happens, if you have saved the hash ID somewhere else—written it on a whiteboard for instance—and type it in correctly, Git will say I don't have anything with that ID.
Because Git is built to add commits, and when two Gits connect and have Git-sex, the receiving Git is usually very willing to add all the sending Git's new commits to itself, new commits spread like viruses. So just because you added, but then retracted, a commit doesn't mean it didn't get out to some other Git. It may come back to you later:
Don't be afraid to make temporary commits, but do remember that if you let other Gits talk to your Git, they might copy your temporary commits, and present them back to you later—so either be careful about which repositories you let your repository have Git-sex with, or be careful about letting sensitive data get into your temporary commits, or both.
Note, too, that when you use git push
, you choose which commits your Git sends to some other Git, so git push
is safer for you—you choose which commits, including temporary ones, you send—than if you allow all users everywhere to read your repository (and hence read all your temporary commits).
Receiving Gits, of course, have to be pretty careful. That's why hosting sites like GitHub offer access control (which is not something built directly into Git itself, but rather, an add-on).
When we have divergent work, such as:
I--J <-- br1
/
...--G--H
\
K--L <-- br2
we might want to combine the two divergent lines of work. That way, we can get a commit that adds the feature that someone added in br1
and the feature that someone added in br2
. That's what git merge
is meant for.
Now, git merge
, as a command, does not always make a merge commit. We need to distinguish carefully between the verb form, to merge, meaning to combine work, and the noun or adjective form, a merge or a merge commit, meaning a commit resulting from doing the work-combining:
The verb form, to merge, is what git merge
usually (or at least often) does.
The noun form, a merge, or its adjective equivalent, a merge commit, is what Git usually (or at least often) makes after doing the to merge work.
So you can see that these are closely related, but not the same thing. One is a process; the other is a result.
We won't go into details about how the process works, but when the result of a merge is a merge commit, that merge commit is just like any other commit, except instead of having a single parent, it has two or more. (Most merge commits have exactly two parents; I'll go into the or more part in a later section.) Remember, all commits have their two parts: snapshot, and list-of-parents. What's special about a merge commit is that its list has two or more parents.
Now, the first parent of any new commit is simply the commit you started with. You run:
git checkout br1
Then you do some stuff to make a new commit, and eventually, you run git commit
. Git builds a new commit, with a new and unique hash ID, by:
That's probably how you got commit J
, for instance: you ran git checkout br1
, which extracted commit I
. You then made a new commit with git commit
. The new commit's parent was commit I
, so that J
pointed back to I
, and now the name br1
selected commit J
instead of selecting commit I
.
When you use git merge
to make the new commit, however,5 Git doesn't write out a single-parent commit and advance the branch name. This time, Git writes out a multi-parent commit. The first parent in the new commit's list of parents is the same as usual, but at least one additional parent goes into the list.
The additional parent, in this case, is the commit you selected when you ran git merge
:
git checkout br1
git merge br2
This causes Git to use commit L
as the other commit. So, after merging the work on the two branches and coming up with an appropriate snapshot, Git now makes new merge commit M
like this:
I--J
/ \₁
...--G--H M <-- br1 (HEAD)
\ /²
K--L <-- br2
The (HEAD)
here signifies that we're "on" branch br1
, so that new commit M
is the new tip of branch br1
. Commit M
has two parents instead of the usual one: the first parent is commit J
, where branch br1
used to point a moment ago. The second parent is commit L
. The branch name br2
has not changed, so it still points to commit L
.
Because M
points to L
as well as to J
, commits K-L
are now on branch br1
. This is why your git log
shows them: they exist and are on the branch. Git finds them by going to commit M
, then going backwards to both commits J
and L
, and from those two, to both commits I
and K
, and from those two, to commit H
. (Of course, Git has to be careful to visit commit H
exactly once, even though there are now two ways to get there. But that's easy enough for Git to do.)
4The snapshot is made from the copies of files that are in Git's index, not from the files you can see and work with. This is why Git makes you run git add
so often.
5If the merge has a merge conflict, the to-merge process will stop in the middle and make you fix the conflict. An eventual git commit
or git merge --continue
will finish the merge and make a merge commit. To achieve that, before stopping in the middle, git merge
writes out this special in the middle of a conflicted merge state. The git commit
command checks for this state and finishes the merge, rather than making an ordinary single-parent commit.
Since you're griping, to some extent, about having to make multiple merge commits to merge more than one branch, it's time to mention Git's octopus merge. Suppose we have a "mainline branch" and two or more features that spring from it, perhaps from a single starting point commit or perhaps from multiple starting points:
o--o--o <-- feature1
/
...--o--o--o <-- main (HEAD)
\
o--o <-- feature2
We can merge the two feature branches one at a time:
o--o--o <-- feature1
/ \
...--o--o---o--M <-- main (HEAD)
\
o--o <-- feature2
and then:
o--o--o <-- feature1
/ \
...--o--o---o--M--N <-- main (HEAD)
\ /
o-----o <-- feature2
There is nothing wrong with this method. It works fine. The mainline branch, main
here, now has two two-parent merge commits M
and N
. The first parent of N
is M
; the first parent of M
is the commit directly to its left, on the main line. The second parent of N
shows how feature2
got merged and the second parent of M
shows how feature1
got merged.
Git offers the ability—in some cases, because when doing this kind of merge, there's no good way to do merge conflict resolution, so an octopus merge must be conflict-free—to use a single merge commit to get this result:
o--o--o <-- feature1
/ \
...--o--o--o---M <-- main (HEAD)
\ /
o--o <-- feature2
Commit M
here has three parents instead of just two. The first parent is directly behind it on the left as usual. The second and third parents are the remaining two branch-tip commits from feature1
and feature2
.
We get this by running:
git checkout main
git merge feature1 feature2
The fact that we named two commits makes git merge
use the -s octopus
merge strategy, which tries to merge all these commits (using an octopus style merge base algorithm) and which does the merge only if it can do so without conflicts. This means there are some merges you could do with two regular two-parent merges that you cannot do with a three-parent octopus; but some people like the octopus merges as they tie all the features in at once, and indicate that there were no conflicts (well, probably).6
Note that an octopus merge still results in putting all the commits on the merged-into branch (in this case main
). Git simply follows all parents of the merge, when you run git log
, so that you see all the commits that are part of the branch.
6Because Git is a set of tools, rather than a complete solution, it's possible to construct an octopus merge that doesn't actually use git merge
at all, or that went through two regular merges. But don't do that. We won't even look at how you could do that.
The git log
walks through commits, one at a time, moving backwards from commits to their parents. Whenever it encounters a merge commit, it has a choice of which commit(s) to move backwards to. But it does not insist on showing you every commit, or even moving to every commit reachable in this way. It just defaults to showing every commit.
You can limit which commits you see, and you can limit which commits git log
will visit in the first place. If you limit the set of commits visited, you automatically limit the commits seen, so this is pretty powerful. We won't look at all the gory details here, but rather only at one very useful and important option: --first-parent
.
When we use --first-parent
, we are telling Git: Whenever you reach a merge commit, pretend that this merge commit has only a single parent, namely, its first parent. In other words, ignore the merged-in commits entirely, and don't even walk down those paths.7 If we have:
I--J
/ \₁
...--G--H M--N--O--P <-- main (HEAD)
\ /²
K--L
where some merge occurred at point M
, and we run git log
, we'll see commits P
, O
, N
, M
, J
, L
, K
, I
, H
, and so on (with the ones between M
and H
happening in some order).8 But if we run:
git log --first-parent
the walk will pretend that commit M
has only one parent, J
, and we'll visit commits P
, O
, N
, M
, J
, I
, H
, and so on, in that order. We never even look at commits K-L
, so we never see them.
7Note that, just like a fork in a road that rejoins later, if you reverse your path—going down the road from your original destination back to your original starting point—what was a join is now a fork, and what was a fork is now a join. So, since Git works backwards, merges are actually where things branch, and branch points are where things come together. It's really all in how you look at it.
8When a merge offers git log
a fork in the graph walk, the actual order in which the commits come out comes from the sorting options you give. The default sort is to show the highest commit date first. If all the computer clocks were accurate when all the commits were made, this shows the commits in the right order, but sometimes one computer's clock is off, and the commits can get weirdly mixed. Consider using git log --graph
to help view the actual commit graph structure, in difficult cases.
As I mentioned at the top of this answer, if you don't want these commits, you must want some other commits. When I said these commits I was both speaking in general—Git stores commits, so that's all you get—but also in specific. If you don't want merge commits, don't make merge commits in the first place. ("Don't start none, won't be none", as they say.)
Now, there are some huge disadvantages to this. If you don't make merge commits, you can't preserve the actual original work you did. You do have that choice though. When you run git merge
you can use git merge --squash
, for instance. This tells Git to go through the merging process, but to make an ordinary, non-merge single parent commit at the end. (It also turns on --no-commit
, for no good reason.9)
If you do use this method, remember to delete the branch names that find the commits from before the merge action since those commits are now redundant with the (single) squash-merge that does them. If you allow those commits to come back into view later, they are likely to cause trouble. This is in many ways the same problem as that sort of viral effect of letting temporary or incorrect commits escape to some other Git repository: Git is built to add commits, not to discard them. But by doing a squash-merge, which does not leave a merge trace, you set a trap for yourself in the future, unless those now-unwanted commits really disappear forever.
If you have multiple merges to do, and each will have some conflicts to resolve, you can do them as normal (non-squash) merges or squash merges. The result will be multiple commits: either multiple merge commits, or multiple ordinary single-parent commits. You can, after doing either of these, then use git reset --soft
to make the new merge-or-not-merge commits hard to find, and then use a plain git commit
to make a new, single, ordinary commit that has the same snapshot as the final merge. As with git merge --squash
, you should in general now consider the merged branches "dead" and you should get rid of those commits and pretend they never existed and hope they never come back to haunt you.
This is not a wrong thing to do, but it requires understanding of what you're doing. Do it only if you understand the consequences.
9The implied -n
is almost certainly just a leftover from the original shell script implementation, carefully preserved for all time in Git's behavior. It's annoying since if you want this behavior, you can use git merge -n --squash
. Right now that's redundant, though.
Upvotes: 6