Cherry picking problems

Question

Branches and commits

mb: A -> B -> C -> G -> H -> I
               \
db:             -> D -> E -> F

Progress both on db (develop branch) and mb (master). Rebasing is what I usually do. Always, actually. But, when on "F" rebasing against mb not possible (too many weird and strange conflicts). Solution: cherry pick!

I created a new branch from "mb" - let's call it "nmb" (new master branch) and in this branch I cherry picked commits D, E and F from "db". (This was the easy way around, since it was fewer commits in "db" than in "mb")

At this point, I didn't analyze the git log, which would probably have made my life easier. All files where in place and the application working as expected so I kept on working...

Now - the exact same scenario is repeating.

I can't rebase "nmb" against "mb" and the log looks awkward:

F' 10/6 -14 12:23:43
E' 10/6 -14 10:03:30
D' 10/6 -14 09:54:10
I  ...
H  ...
G  ...
F  10/6 -14 12:23:43
E  10/6 -14 10:03:30
D  10/6 -14 09:54:10
C  ...
B  ...
etc

The D' and D, E' and E, F' and F share the same date/time, message and files, but different commit hashes.

Is this the result of the cherry pick and expected? One commit for the original and one commit for the cherry pick?

Is git log showing the correct data?

I'm really lost in git land.

I should add I'm a newbie and not very familiar with gits all nice features. This is probably a huge lapsus on my side and the easiest thing would be to save the files elsewhere and start over...

torek · Accepted Answer

Is this the result of the cherry pick and expected? One commit for the original and one commit for the cherry pick?

Yes: that's how cherry-picking works; it copies commits—copies the message and timestamps, that is. It also makes the same changes as the original. When you were on nmb and did git cherry-pick D, git essentially did git show D to see what you did, applied the same changes, and made new commit D', leaving D around.

Is git log showing the correct data?

Assuming you've asked it to show you several branches, yes.

Rebasing is what I usually do.

All "rebase" is, is cherry-pick on steroids (as it were), followed by one more trick.

Let's redraw your first picture with the arrows going the other way, because that's really how they work: a branch points to the tip-most commit, and each commit points back to its parent(s). (For that matter, a merge is just a commit that points back to at least two parents.) Git has to do things this way because all commits are permanent¹ and unchangeable—if you try to change anything, you get a new, different commit.

A <- B <- C - G - H - I   <-- mb
            \
              D - E - F   <-- db

(I stopped bothering with the arrow directions after C, but they're basically leftward.)

If you want to rebase db onto mb, you can tell git:

git checkout db
git rebase mb

What rebase does is look for where your current branch (db) joins up with the target branch (mb), and take all the commits after that—in this case, D, E, and F—and cherry-pick them. To do the cherry-picking, it makes a new branch—it actually uses an anonymous one, but let's just call it tmp—and then does the git cherry-pick commands. If all goes well, the result is:

                          D'- E'- F'  <-- tmp
                        /
A <- B <- C - G - H - I   <-- mb
            \
              D - E - F   <-- db

after which git simply erases the old db label and pastes db on over the tmp label:

                          D'- E'- F'  <-- db
                        /
A <- B <- C - G - H - I   <-- mb
            \
              D - E - F   <-- [old db, now abandoned]

Most git commands don't show you the old, abandoned commits, but since they're semi-permanent,¹ they're still in there.

¹Commits (and any other objects in the repository, in fact) stick around forever as long as they're "reachable" via the commit graph. Branch-names, pointing to the tip of a branch, make that particular commit reachable. Since each commit carries with it its parent ID(s), those commits are reachable too, and their parents in turn, and so on. It's only unlabeled objects that are unreachable.

When git updates a branch, it normally keeps a log of "where the branch used to point". These reference logs ("reflogs" for short) count, in terms of keeping commits reachable, but git log ignores them unless you tell it to use the reflogs (git log -g).

Each reflog entry has a time stamp for when it was made. The reflog entries eventually expire, after 30 to 90 days by default. Once the reflog entries are gone, these commits become truly unreferenced, and then git gc will eventually delete them.

Cherry picking problems

Answers (1)

Related Questions