Reputation:
Lately, I started to use a git rebase function to keep a feature branch updated with a development branch.
I know the basics of git rebase
, but can't figure if rebase function marks commits that are rebased.
I have a scenario where I need to sync the feature branch with the dev branch quite often.
So, I did the first rebase and solved all conflicts, everything was fine. After a while, I needed to sync again with the dev branch. Rebase is now trying to resolve conflicts that are already resolved in the first rebase.
What I'm doing wrong here and how to skip those commits that are synced with git rebase
or git merge
?
Upvotes: 0
Views: 193
Reputation: 488103
Rebased commits are not marked in any special way.
Rebase works by copying commits, as if by git cherry-pick
, then abandoning the original commits in favor of the new and (supposedly?) improved copies. The originals continue to exist, but if there is no way to find them, you won't see them any more.
Branch names are largely irrelevant here. Branch names just let you find particular commits—very useful to humans, and of some use to Git, but once the commits have been found, the names don't matter to Git any more. The way a branch name works is that it contains the hash ID of one particular commit; that commit is, by definition, the last commit in the branch.
What does matter, a lot, is the commit graph. Every commit has its own unique hash ID, which never changes, and always means that commit—in fact, it means that particular commit in every Git, even in a Git repository that does not yet have that commit.1 Copying a commit to a new-and-improved commit results in a new, different commit with a different hash ID.2
Remember that along with the snapshot of your code, each commit carries some metadata, including who made it (author and committer) and when (date-and-time-stamps) and why (log message). In this metadata, you'll also find a parent hash ID. The parent of a commit is the commit that comes before the commit. So given some commit, we can look backwards to find the previous commit. If we let uppercase letters stand in for real hash IDs, we can draw this:
<-H
The commit whose hash ID is H
contains the hash ID of—or points to, for convenience—some earlier commit. Let's all that one G
:
<-G <-H
Of course, G
points back to F
, and so on:
... <-F <-G <-H
A branch name just holds the hash ID of—i.e., points to—the last commit:
... <- F <-G <-H <-- branch
Git uses the branch name to find the last commit. Whatever commit hash ID is stored under the name branch
, that's the last commit in the branch. The earlier commits are implicit: they're defined by the graph, which is defined by the various "points-to" relationships stored in the commits. Since they're in commits, they cannot be changed: nothing about any commit can ever be changed.
The usual idea behind rebase is to take some set of commits:
...--F--G--H--L <-- master
\
I--J--K <-- develop
and "transplant" them so that they come after some other commit. To do that, Git needs to turn each snapshot into a set of changes, then apply those changes to some other commit. In this case, we want to copy commit I
, which is fine by itself, to a new-and-improved I'
. The difference between I
and I'
will be:
I'
will be L
, not H
.I'
will be the result of applying H
-vs-I
to whatever is in L
.This process of taking H
-vs-I
and applying to L
is a git cherry-pick
: we check out commit L
and run git cherry-pick <hash-of-I>
to get it done.
Now that we've made I'
:
I' <-- HEAD
/
...--F--G--H--L <-- master
\
I--J--K <-- develop
we need to copy J
to J'
. That's another git cherry-pick
: we want to find what what changed between I
and J
, and apply those changes here to commit I'
. When done, we have:
I'-J' <-- HEAD
/
...--F--G--H--L <-- master
\
I--J--K <-- develop
Now we need to copy commit K
, with the last cherry-pick we need to do. When we're done we have:
I'-J'-K' <-- HEAD
/
...--F--G--H--L <-- master
\
I--J--K <-- develop
and we've built out our copies. The final step of git rebase
is to yank the name develop
off commit K
and make it point instead to commit K'
:
I'-J'-K' <-- develop (HEAD)
/
...--F--G--H--L <-- master
\
I--J--K [abandoned]
Rebase will also re-attach the special name HEAD
, which remembers which branch we're on. (This is why we should draw the name HEAD
attached to some branch like this, except when Git is in its detached HEAD mode, which is the case during the rebase while we're copying commits, one commit at a time. In detached HEAD mode, the special name HEAD
contains the raw hash ID of a commit, rather than containing the name of the branch.)
1This particular magic is achieved by making the hash ID a checksum of the contents of the commit. To help ensure uniqueness and prevent spoofing, the checksum is a cryptographic one. But this means that if you ever try to change the contents of any commit, even just one single bit, the result is not a changed commit, but rather a new commit with a different hash ID. The original commit remains, with the original hash ID.
2If you copy a commit and make no changes at all, so that the new commit is bit-for-bit identical to the original commit, you get back the original hash ID again. So in this case there's no actual copy: but that's OK, nothing changed. It is possible to do this in some cases, and many kinds of rebase will do it automatically when possible. The --force
option to git rebase
tells Git: *Even if you could leave the commit absolutely identical, copy it anyway: change something by giving the new copy the current date-and-time as its committer timestamp, rather than re-using the original commit's committer timestamp.*
During each copy-a-commit (git cherry-pick
) step, you can get merge conflicts. This happens because the actual mechanism behind cherry-pick is Git's usual three-way merge.
Let's look at a more typical merge, where we actually start with two branches that have diverged since some common starting point:
I--J <-- branch1 (HEAD)
/
...--G--H
\
K--L <-- branch2
We'll run git merge branch2
while on branch1
like this. Git will:
Find the merge base: a shared commit, one that's on both branches, and the best such one. To do that, Git starts from where we are now—commit J
as pointed-to by branch1
, to which HEAD
is attached—and working backwards, and, at the same time, starting at the other commit we name: L
, the tip of branch2
. Git works backwards from J
and from L
at the same time, in a sort of graph flooding algorithm that follows the arrows backwards.
You can think of this as Temporarily, paint all commits green starting from J
and working backwards. Likewise, paint all commits red starting from L
and working backwards. When the two paints start mixing, we've hit a merge base.
In this case, commit H
is clearly the merge base. (There are more complicated graphs where there the merge base is not so obvious, but as long as the two branches eventually meet up, there will be some merge base. In some ugly cases, there can be more than one merge base, but we'll ignore that here!)
Now that we have three commits, do the merging process: merge as a verb, as I like to put it. We'll look at this more closely in a moment.
If all goes well, make a merge commit. This uses the word merge as an adjective, modifying the word commit. We (and Git) can even noun this word and just say a merge, meaning a merge commit.
(If things don't go so well, Git stops, leaving you with a merge conflict. You must fix this on your own and then run git merge --continue
, or just git commit
, to make the final merge commit.)
The result is a commit that works like usual: it has a snapshot as usual, and metadata with a log message as usual. The only thing un-usual about it is that instead of listing one parent hash ID, it lists both parents, so that we draw it like this:
I--J
/ \
...--G--H M <-- branch1 (HEAD)
\ /
K--L <-- branch2
Commit M
is our merge commit; Git makes the name branch1
advance to point to the new commit, as usual, so now the name branch1
identifies commit M
as the tip of the branch.
Let's look a little bit more closely at the merge process in step 2 above.
We already know that each commit holds a snapshot. We also know that we can have Git compare any two snapshots:
git diff <hash1> <hash2>
Git will look at all the files in the first, left-side commit, and all the files in the second, right-side commit. When files match up, Git will inspect their contents. If the contents match up, Git says nothing about those files. If the contents differ, Git compares the contents and figures out a series of changes that, if applied to the left-side file, produce the right-side file.
Some files don't necessarily get matched-up at all. The left-side file might just be deleted, and a new file might be added on the right. Those show up in the diff as deletes and adds. We can also ask Git to guess at renames, and it will try to pair up an add on the right with a delete on the left: if the files are similar enough—especially, if the contents are 100% identical—Git can match them up and say that the left-side file is renamed to become the right-side file, with maybe some content changes too.
What merge as a verb does is to run these kinds of diffs, but to run two of them. We start with the merge base—which is on both branches—and diff it against each of the two branch tips:
git diff --find-renames <hash-of-base> <hash-of-HEAD> # what we changed
git diff --find-renames <hash-of-base> <hash-of-other> # what they changed
For any file that didn't change, it's the same in all three commits and we can just use any of them. For files that did change, the merge process should combine the two changes, and apply the combined changes to what's in the merge base. That way, we keep our changes and add their changes.
Merge conflicts arise when Git can't do this combining on its own. Git fails to combine changes when:
This "same lines" thing gets extended slightly: if we touched, say, lines 5 through 9, and they touched lines 10 and 11, that's a conflict too. If they touched lines 11 and 12, so that line 10 is unchanged, there's no conflict. There is no strong theoretical reason for this, it just proves to work pretty well in practice. But note that Git doesn't understand the stuff it's merging, at all: it's just following simple rules about combining lines.
In any case, wherever Git is able to combine our changes and their changes, Git applies the combined changes to what's in the merge base. That gives the right result. So merge-as-a-verb means:
and it all goes well when Git thinks it combined the changes correctly. That's true even if Git didn't actually combine them correctly (in some deeper sense of correct), which happens sometimes. For instance, if we had an unused variable declared in the merge base, and on the left, we removed the declaration, and on the right, they used the variable, Git might be able to combine these. The final result uses an undeclared variable! (This is less likely in some programming languages than in others.)
And, of course, we get merge conflicts when Git can't combine the changes.
git cherry-pick
Earlier, we described git cherry-pick
as "copying" a commit: find out what changed, and apply those changes to a different snapshot. But in fact, Git uses its merge engine to do a full merge.
The merge base of this full merge is simply the parent of the commit we want to copy. The --ours
commit is the current (HEAD
) one. The --theirs
commit is the commit we want copied. So if we have:
I' <-- HEAD
/
...--F--G--H--L <-- master
\
I--J--K <-- develop
and we're trying to copy J
to make J'
, we pick commit I
as the merge base and commit J
as the --theirs
commit. Git then does a git diff
from commit I
to commit I'
to see what "we" changed.
What "we" changed, to get from I
to I'
, is whatever went into I'
because of H
-vs-L
! That happened in the first cherry-pick, when we were combining H
-vs-L
(as --ours
, even though it's theirs) with H
-vs-I
(as --theirs
, even though it is our commit that we're rebasing).
What "they" changed, to get from I
to J
, is of course the changes we want to copy—our changes from the original develop
branch. Combining these two sets of changes will work, but may force us to resolve conflicts.
When we do have to resolve conflicts, we git add
our resolutions as usual, and run git whatever --continue
to resume. If we were using git cherry-pick
directly, we'd use git cherry-pick --continue
, but since we're using git rebase
, we use git rebase --continue
. Git will finish the cherry-pick by making a non-merge, ordinary commit, in the usual way.
Rebasing can see the same conflicts again in multiple ways. It all depends on which commits you copy and what conflicts you get and how you resolve them.
Let's go back to the original diagram of a rebase, and look at the "after" drawing:
I'-J'-K' <-- develop (HEAD)
/
...--F--G--H--L <-- master
\
I--J--K [abandoned]
We might have resolved some conflicts when we made the I'-J'-K'
chain. That would be OK if I-J-K
are really gone, but what if the drawing is incomplete? Suppose the original drawing should have been:
...--F--G--H--L <-- master
\
I--J--K <-- develop
\
M--N--O <-- feature
When we copy I-J-K
to new I'-J'-K'
and yank the name develop
up to point to the last copied commit, we get:
I'-J'-K' <-- develop (HEAD)
/
...--F--G--H--L <-- master
\
I--J--K
\
M--N--O <-- feature
Commits I-J-K
are not abandoned. They're right there on feature
, where they always have been! Let's say we now merge feature
into master
with a regular merge:
I'-J'-K' <-- develop (HEAD)
/
...--F--G--H--L---------------P <-- master
\ /
I--J--K--M--N--O <-- feature
We make the merge in the usual way, by diffing the merge base H
against L
to see what we changed on master
, and diffing H
against O
to see what they changed on feature
. We combine the changes, apply the combined result to H
, and get commit P
, which records both parents L
and O
.
If we now go to rebase feature
onto master
, we're likely to see some conflicts. The correct resolution is probably just to drop commits I'-J'-K'
entirely, as they're already incorporated into master
: we should just have:
...--F--G--H--L---------------P <-- master, develop (HEAD)
\ /
I--J--K--M--N--O <-- feature
as our final result.
There are lots of ways to have very similar conflicts, and because git rebase
repeatedly uses cherry-pick, you can in some cases have to resolve the same conflict multiple times even in one rebase. This happens when you choose their change, or include parts of it, and your changes in your own later commits in the cherry-pick sequence affect the same lines in which you used their resolution or altered your changes.
Without seeing the actual commit graph and the various snapshots that Git sees, all we can say here is that this is a general problem. Git does not know that a rebased commit was rebased before: it's just a commit, like any other commit. Git is going to cherry-pick it by using its parent as a merge base and diffing that merge base against your current/HEAD commit, and against the child commit you're cherry-picking. If the changes touch the same lines, or abut, Git will declare a conflict and make you resolve it. That's really all there is to it: you have to figure out whether the current commit should be copied at all, and if so, how. You have to figure out if an earlier version of that commit has been incorporated into whatever commit you're using as HEAD
in your rebase. It's all up to you: Git only knows simple text rules.
Upvotes: 1