Git rebase while maintaining the latest version of a file in one branch

Question

I have a file in my local branch and I want to be able to rebase origin/main while making sure that after the rebase this file in my local branch will be the exact same as it is right now.

Is there a way to do a rebase and guarantee that? Even better if during the rebase I don't have to answer any questions or resolve any conflicts for this file.

torek · Accepted Answer

TL;DR

Use a temporary tag to mark a commit that has the desired copy of the file. Then, use git rebase -i and insert x commands to run a short script after each pick. You have a choice of what, precisely, to put in this script, but this (untested) might be what you want:

#! /bin/sh
git checkout temp-tag -- path
git diff-index --quiet HEAD || git commit --amend --no-edit

Once this is all done, remove the temporary tag (and the script; it's not like it was difficult to write, and it has the tag and path hardcoded).

Long

To make sense of this answer, start by memorizing this fact: in Git, files aren't really in branches. Files are really in commits.

Commits are contained in branches—or in other words, found by using branch names, then working from commit to commit, backwards, through the links that Git stores in each commit. So you can go from branch name to commit and thence to file. But that "to commit" step is critical, because each commit has a full snapshot of every file.

Next, let's look at what git rebase does and how it does it. Remember that Git is all about commits, and each commit has a unique hash ID. No part of any existing commit can ever be changed. So, since rebase literally can't change any of the existing commits, it necessarily has to work by copying the old (and lousy, or at least inadequate in some way) commits to new-and-improved commits. These new-and-improved commits are the same as the old commits in some way, and different in some way.

Each commit, as found by its unique hash ID, has two parts:

There's the main data of a commit: the source code snapshot that goes with this commit. These aren't changes. The snapshot has each file exactly as it should appear if that one particular commit is checked out later.
Besides the data, each commit has some metadata, or information about the commit itself: who made it (name and email address), when (date and time stamp), and so on.

The metadata separate the "who made this commit" into two parts: the author is the name, email, and timestamp from whoever made the commit originally, and the committer is the name, email, and timestamp of the person who made this variant of the commit. So when we copy an old commit like this, we retain the original author, but set up a new committer. If you're copying your own commits, this means that the name-and-email doesn't really change—the old one had you as both, and the new one has you as both—but the committer time-stamps do change.

Most importantly, though, each commit records the hash ID of its previous or parent commit. The point of rebasing is typically to take a string of commits like this:
```
          I--J--K   <-- feature
         /
...--G--H--L   <-- mainline
```
and make new-and-improved versions of commits I, J, and K, so that the new commits descend from L rather than from H:
```
          I--J--K   <-- feature
         /
...--G--H--L   <-- mainline
            \
             I'-J'-K'   <-- new-and-improved-feature
```
where commit I' is a "copy" (sort of) of commit I, J' is a copy of J, and K' is a copy of K.

Without worrying too much about the mechanics of the copying process—though I'll mention here that it uses git cherry-pick—let's make one last observation, which is that the way we (and Git) find commits is to use the branch name to find the last commit in the chain. When commit H was the last commit of mainline, we found it because we had:

...--G--H   <-- mainline

The name mainline held the hash ID of commit H. So git checkout mainline would extract commit H for us to use or work on/with. But then we, or someone, made a new commit that added on to mainline, which we are calling commit L, so that we have:

...--G--H--L   <-- mainline

The name mainline now holds the hash ID of commit L. A git checkout mainline command will extract commit L for us to use. To even find commit H, we have to have Git open up commit L and read its metadata. This metadata contains the raw hash ID of earlier commit H.

What this means for us is that once we have accomplished this:

          I--J--K   <-- feature
         /
...--G--H--L   <-- mainline
            \
             I'-J'-K'   <-- new-and-improved-feature

we can take the name feature off commit K and paste it onto commit K' instead, like this:

          I--J--K   ???
         /
...--G--H--L   <-- mainline
            \
             I'-J'-K'   <-- feature

Now, when we try to see what commits are on branch feature, we'll have Git start by using the name feature to locate commit K'. Commit K' points back to earlier commit J', which points back to I', which points back to L. Our rebase will be complete once we move the branch name, and toss out any funky special name that we might have been using while building the I'-J'-K' sequence.

(Exercise: What happens to commits I-J-K? Does it matter? How would we even know if they're still in the repository?)

With the before-and-after in mind, let's look at how `git rebase` works

I mentioned above, rather briefly, that git rebase uses git cherry-pick to copy each commit. The cherry-pick command, in turn, works by ... well, technically it's a full-blown three-way merge, but it's easier to see it, at first, by looking at what happens when we compare just two commits.

Let's start with this, our "before" picture:

          I--J--K   <-- feature
         /
...--G--H--L   <-- mainline

We need to have Git check out commit L, which is where we want to have the new commits go. If we were doing this the normal way, we'd make a new branch name such as tmp, using:

git checkout -b tmp

(or the same with the git switch command in Git 2.23 or later). Git actually uses what it calls detached HEAD mode for this, with the special name HEAD pointing directly to a commit:

git checkout

or:

git switch --detach

which produces this:

             I--J--K   <-- feature
            /
   ...--G--H--L   <-- HEAD, mainline

Now Git runs git cherry-pick hash-of-I. Git saved the hash IDs of commits I, J, and K during the whole setup process. If you use git rebase --interactive here, you'll see pick commands that list these hash IDs.¹ The pick represents a cherry-pick command.

The cherry-pick itself winds up comparing the saved snapshot in commit H against the saved snapshot in commit I. The difference between these two snapshots is, in effect, a set of instructions that can be applied to a snapshot as well. Applying that set of instructions to the snapshot in H produces the snapshot in I. But what if we apply these instructions to the snapshot in L?

If we do just that—and assuming it works and has no merge conflicts²—and make a new commit from the result, we'll get commit I'. We will have Git save the original author information and the original commit message as-is, and generate a new set of committer information and use the snapshot we got by applying the diff. The result is:

             I--J--K   <-- feature
            /
   ...--G--H--L   <-- mainline
               \
                I'  <-- HEAD

Git now goes on to do a git cherry-pick hash-of-J, to copy commit J by comparing I-vs-J and applying this to I':

             I--J--K   <-- feature
            /
   ...--G--H--L   <-- mainline
               \
                I'-J'  <-- HEAD

Finally—since there are only three commits—we do our last cherry-pick of commit K, which compares J-vs-K (and J-vs-J' if you are interested in the merge aspect of cherry-pick) to build commit K', which leaves us with this:

             I--J--K   <-- feature
            /
   ...--G--H--L   <-- mainline
               \
                I'-J'-K'  <-- HEAD

and the only task left is to move the name feature to point to the current commit K' to get:

             I--J--K   ???
            /
   ...--G--H--L   <-- mainline
               \
                I'-J'-K'  <-- feature (HEAD)

This completes the rebase process.

¹The instruction sheet for git rebase, that you get to edit, has the hash IDs abbreviated. I've never been quite sure why: Git has to expand them back out to use them internally. Maybe the Git folks just think they look less intimidating when there are 7 or 12 random-looking characters instead of 40. For git describe output, where this might go in someone's email or something, sure—but here, they're just instructions on a temporary page, and if you edit them you can use "move line" instructions in your editor.

²Merge conflicts, if any, arise from comparing the snapshot in H vs the snapshot in L as well. That's the case for the first cherry-pick, at least. The two subsequent cherry-picks use commits I and J as the merge bases, with the --ours commits being the commit built in the previous step. This is where it all gets a little tricky.

What you want

I believe what you want is that, after each cherry-pick, you'd like some particular file in the new (copied) exactly match some particular file in some particular earlier commit.

Let's assume that existing commit K has the desired version of the file. What we'll do—to avoid depending on Git not moving the name feature, and to let you pick any commit—is to create a temporary lightweight tag identifying this commit:

git tag temp-tag

Note: if there is not a single fixed version of the file that should go into every copied commit, you'll want a different strategy for locating the source commit for the checkout, but the rest can continue to work.

Next, we'll use git rebase -i. This turns the set of cherry-picks into an editable instruction sheet. Using our editor, after each pick command, we add a line using the exec or x command:

pick 
x /tmp/script

(assuming our little script has been put in /tmp/script and made executable).

Git will execute the cherry-pick command, all the way to its completion, which involves making the new commit (I', J', or K' in our example). Then it will run the script because of this x line. The script:

Extracts a particular file from a particular commit: using temp-tag, we get the desired file from the desired commit, placing it into both Git's index and the working tree. (The index copy is the one that matters, but it's good to update the working tree too, for sanity's sake if nothing else.)
Tests to see if the result merits replacing the tip commit (git commit --amend). This is our git diff-index --quiet HEAD. If the index still matches the current commit, there's nothing to change. Otherwise, we'll run git commit --amend, which shoves the current commit out of the way and makes a new one. Using --no-edit, we tell git commit to simply re-use the existing commit message.

Note: In this case, even if there are no changes, git commit --amend --no-edit is actually safe, but it's wasted effort. For this script and task, that's probably not really relevant, but it seems good not to perform a lot of unnecessary work.

So, this will make sure that each replacement commit is itself replaced during the rebase, with a "corrected" replacement with the single file swapped out to the one we want. That way, by the time Git gets around to yanking the branch name off the old branch and putting it onto the end of the replacement commits, each of the replacement commits is the actual desired new-and-improved commit.

Aside from cleaning up (removing the lightweight temp-tag tag and removing the script), nothing else needs to be done.

Git rebase while maintaining the latest version of a file in one branch

Answers (2)

TL;DR

Long

With the before-and-after in mind, let's look at how `git rebase` works

What you want

Related Questions

Git rebase while maintaining the latest version of a file in one branch

Answers (2)

TL;DR

Long

With the before-and-after in mind, let's look at how git rebase works

What you want

Related Questions

With the before-and-after in mind, let's look at how `git rebase` works