Dan Jones
Dan Jones

Reputation: 1440

How do I copy a block of code from one git branch to another, while retaining commit history?

I'm working on two features in the same repo at once, and I've realized that a function in one branch would be useful in the other branch (in the same file). The first branch isn't 100% done, so I don't want to merge/rebase that one onto the second. I really only need one small block of code.

Obviously, I could just copy and paste, but that wouldn't retain the commit history, and I imagine, it would potentially lead to merge conflicts later on when I merge both of them into master.

The function I need is spread out over multiple commits, that also have other changes in them, so I can't easily cherry-pick the commits that make up that block of code.

I also can't do an interactive rebase and change the history of the first branch (making it easier to cherry-pick the one bit of code I need), because, besides being a huge amount of code, I've already pushed my code to GitHub, and it's waiting on code review before I can merge into master, so I shouldn't be rewriting history while my colleagues are reviewing my code.

So, what are my best options here? This feels like I need some crazy git-fu in this situation to keep things nice and clean.

The only other option I can think of is do the interactive rebase locally, without pushing to GitHub, then cherry-pick that one commit that includes the function I need, and then reset my first branch back to the remote branch. But that feels like a less ideal solution, and I'm not sure that would really be much different than simply copying and pasting the code.

Upvotes: 1

Views: 930

Answers (3)

max630
max630

Reputation: 9238

if rebasing original as in the other answer is not appropriate, you can copy and paste the function into a file with different name, so that there will be no complex conflict at future merge. You will only need to remove the temporary file. Depending on your language you may notice it by compilation error because of duplicated definition.

Upvotes: 1

Yawar
Yawar

Reputation: 11627

Extract that function's implementation out into a separate branch, get that merged into the integration branch, then rebase on top of the latest integration branch. Now, after a little bit of conflict resolution with the branch which originally implemented the function, you'll now have the function available in both branches.

Upvotes: 0

torek
torek

Reputation: 488965

Obviously, I could just copy and paste, but that wouldn't retain the commit history ...

Correct. The history is the commits; copying code, or files, and making new commits, makes new history.

and I imagine, it would potentially lead to merge conflicts later on when I merge both of them into master.

Maybe. With luck, not. The key here is that merging works by combining two separate diffs. What you have now looks like this:

          A--B--C--D   <-- yourbranch
         /
...--o--*              <-- master, origin/master

(replace master with branch as appropriate). Here "your commits" are the ones labeled A through D, making up the A--B--C--D chain.

Here is what you get if you copy (all of) your commits:

          A--B--C--D   <-- yourbranch
         /
...--o--*              <-- master, origin/master
         \
          A'-B'-C'-D'  <-- yourcopy

Now, here's how git merge works. We assume here that you are on master and are merging yourbranch:

tip1=$(git rev-parse master)
tip2=$(git rev-parse yourbranch)
base=$(git merge-base $tip1 $tip2)

Once this is done, tip2 points to commit D, while tip1 points to the tip of the current branch master, which is commit *. Meanwhile base points to the merge base of these two commits. That's the first commit where the two branches join up, which is ... commit * again!

A merge that is not forced one way or another (--no-ff or --ff-only) will check whether the merge base is the same as the current tip (tip1, or commit *). Since it is, that merge will become a fast-forward instead, which is not really a merge at all.

A merge that is forced --no-ff goes ahead and makes a real merge, even though it's trivial. In this case we will get:

          A--B--C--D   <-- yourbranch
         /          \
...--o--*------------M   <-- master, origin/master
         \
          A'-B'-C'-D'  <-- yourcopy

(A merge that is forced --ff-only still checks. If it cannot do a fast-forward, it simply fails.)

For the moment, let's assume we have a forced merge M.

Now, suppose you've taken your copy yourcopy and made another commit E:

          A--B--C--D   <-- yourbranch
         /          \
...--o--*------------M   <-- master, origin/master
         \
          A'-B'-C'-D'-E  <-- yourcopy

You may now ask Git to git merge yourcopy, which does all this same stuff:

tip1=$(git rev-parse master)       # commit M
tip2=$(git rev-parse yourcopy)     # commit E
base=$(git merge-base $tip1 $tip2)

Where do master and yourcopy join up? Follow the commits backwards until the two streams join: that's commit * again. So the merge base is commit *. Now we have a non-trivial merge, so Git has to make two diffs:

git diff $base $tip1     # * vs M: basically, A+B+C+D as a patch
git diff $base $tip2     # A'+B'+C'+D'+E as a patch

The merge code now tries its best to combine these two patches, taking each change once. But A+B+C+D is the exactly the same as A'+B'+C'+D', so the only difference between these two patches is basically just commit E. In "good" cases—which are surprisingly common in the end—there are few or even no merge conflicts here.

The same holds even if you only copy some of your commits:

          A--B--C--D   <-- yourbranch
         /          \
...--o--*------------M   <-- master, origin/master
         \
          B'-D'-E        <-- yourcopy

The diffs on one "side" are basically A+B+C+D, while the diffs on the other "side" are B'+D'+E which is really just B+D+E. Git probably (and usually) notices that B' and D' are already in there, and takes just the changes that amount to E.

All of this remains true even if the original "merge" is a fast-forward. We just redraw the intermediate commit graphs as:

          A--B--C--D   <-- yourbranch, master, origin/master
         /
...--o--*
         \
          A'-B'-C'-D'-E  <-- yourcopy

The merge base remains * and everything works as before. If you make a merge now (with current branch set to master), you get:

          A--B--C--D        <-- yourbranch, origin/master
         /          \
...--o--*            ---M   <-- master
         \             /
          A'-B'-C'-D'-E     <-- yourcopy

You get a similar graph, with the A'-B'-C'-D'-E sequence in it, if you make a merge M2 atop merge M. In both cases, the one annoyance is that the copied commits remain in the repository, so that you see them twice when you run git log.

The above is too simple

Of course, in a case like this, you could just add E atop D:

                     E   <-- feature2
                    /
          A--B--C--D   <-- yourbranch
         /
...--o--*              <-- master, origin/master

which would be the way to go: there are no copied commits and everything is nice (though feature2 now depends on yourbranch).

What you really have looks more like this, no doubt:

          A--B--C--D   <-- yourbranch
         /
...--o--*              <-- master, origin/master
         \
          E--F--G      <-- feature2

You have now noticed that you'd like some or all of the code from the A-B-C-D sequence in feature2, as if you'd rebased feature2 atop yourbranch. Again, though, you can just git cherry-pick as much of it as you want, making A', B', C', and/or D' commits. You can restructure (rebase -i and re-organize) these commits however you like, but let's just draw this with B and D added as copies:

          A--B--C--D   <-- yourbranch
         /
...--o--*              <-- master, origin/master
         \
          E--F--G--B'-D'   <-- feature2

As before, merging will try not to copy the code (although the history from the extra commit copies, will show up). But let's say you later get the go-ahead to rebase feature2 atop your (merged-in) A-through-D. What happens when you do that is even better, because git rebase—which works by copying commits, the same as git cherry-pick—actually searches for already copied commits and leaves those out. So let's say master and origin/master get fast-forwarded to include D, or D gets merged in—it does not matter which happens, as long as commits A through D are now all on master (even if via a merge commit):

          A--B--C--D      <-- yourbranch
         /          \
...--o--*------------M    <-- master, origin/master
         \
          E--F--G--B'-D'  <-- feature2

Now you check out feature2 and run git rebase master. Your git finds commits E-F-G-B'-D' and gets set to copy them, but as it copies, it checks: is this commit already there? This check is not by commit hash—this check would fail—but rather by patch hash (see the git patch-id documentation for details).

Since B' is a copy of B (and therefore has the same patch ID), and D' is a copy of D, rebase copies only E-F-G, giving:

          A--B--C--D             <-- yourbranch
         /          \
...--o--*------------M           <-- master, origin/master
         \            \
          \            E'-F'-G'  <-- feature2
           \
            E--F--G--B'-D'       [old feature2, now abandoned]

Cover up the "abandoned" commits (block them off your screen with your hand, for instance :-) ) and it looks as though Git has magically done just what you would want. (It's not exactly magic—it's just patch IDs—but it is probably just what you want. Of course, if you are forced to rework B and/or D, the reworked ones that eventually go in may no longer match the copies you made, and then you will really see the lack of magic.)

Upvotes: 1

Related Questions