Git Merge a Folder into a Repository using a Specified Ancestor Folder

Question

I have a git repository, call it Repo1:

Repo1
    Folder1
    Other stuff...

and I have two folders that contain a subset of the files in Repo1. A Baseline folder:

Baseline
    Folder1

...and a ChangeSet folder:

ChangeSet
    Folder1

Baseline contains the files from Repo1 that represent the common ancestor of any files in ChangeSet.

I'd like to do a 3-way merge of the changes from ChangeSet into Repo1. I've looked into creating a temporary repository containing two commits, the first for the baseline, and the second for the changeset, and then merging with --allow-unrelated-histories:

git merge  --no-commit --allow-unrelated-histories

...but this appears to mark any changes as conflicts, and doesn't seem to use the ancestor at all.

I'm guessing that I could use git-merge-file to merge any non-binary files that may exist in all three locations, and then handling all binary conflicts, added, deleted files etc. myself, but I wonder if there is a more straightforward solution.

Thanks in advance.

Edit: From the answer below, Changeset was probably the wrong choice of words for the updated files folder. Probably a better word would be Snapshot

Update 2021: The completed script for this question is now on GitHub as git-stash2d

torek · Accepted Answer

Edit: You're on the right track in your own answer: cherry-pick is almost certainly the way to go, for your actual case. The trick is to put their original tree in as an "orphan branch" (independent commit) and then to put their patch in as the second commit on this branch, then to go back to your own branch and use git cherry-pick. Cherry-picking is internally implemented as a full three-way merge, with the merge base being the parent of the commit being cherry-picked and the --theirs commit being the commit you name.

Instructions

In your original repository (or an added work-tree for that repository, if you don't want to mess with your main work-tree), do:

git checkout --orphan xxx         # use any name you like here
git read-tree -m -u 4b825dc642cb6eb9a060e54bf8d69288fbee4904

The hash ID here is that of the empty tree. Using --empty logically should work here but doesn't. Or instead of the read-tree, use:

git rm -r .

which does exactly the same thing, and as a bonus is easier to type in, but looks scarier, somehow. 😀

Your work-tree should now be empty and git status will say:

On branch xxx

No commits yet

nothing to commit (create/copy files and use "git add" to track)

If your work-tree is not empty, it contained untracked files before, and still does. You should move or remove them (or, again, you can do this all in an added work-tree).

Now do what you suggested in your own answer:

# copy my Baseline folder changes in
git add .
git commit -m "baseline"

(side note: do not use git commit -a; it doesn't do what you want).

I had understood "changeset" to mean "a diff you will apply", rather than "a new set of files". Changeset is the wrong word to describe a new snapshot, but if that's a new snapshot, it's now time to empty out the work-tree again:

git rm -r .

to use the version that is easier to type in. Then, again almost straight from your own answer:

# copy my ChangeSet folder changes in
git add .
git commit -m "code"

You can now git checkout master and git cherry-pick xxx. Substitute in whatever branch name you're using to hold the two commits.

[Original answer below.]

I'd like to do a 3-way merge of the changes from ChangeSet into Repo1. I've looked into creating a temporary repository containing two commits,

You're at least one short. A merge has three inputs, not two:

the first for the baseline, and the second for the changeset, and then merging with --allow-unrelated-histories:

You're on the right track with using the first one for the baseline.

The other two that you need are:

one with baseline-plus-changes: this is their code, or the --theirs side of the merge, and
one with your code: this is the --ours side of the merge. This is the commit you will have out as HEAD as a result of running git checkout.

Both of these two commits must descend, historically speaking, from the baseline. That way Git can compare the merge base snapshot—in this case, the baseline—to each of the two branch tip snapshots: your code, and their-code-as-modified-by-their-changeset.

Hence:

# create initial commit in initial repository:
git init         # create new empty repository
...              # copy baseline into place
git add .
git commit

# add their changeset as a new commit on a branch:
git checkout -b theirs
... apply the changeset, perhaps with "git apply" ...
git add -u       # or git add . again, or similar
git commit

# add your version of the code as a new commit on master:
git checkout master
... copy your code into place ...
git add .        # or similar
git commit

Now you can run git merge theirs. The three inputs are a merge base commit, your current commit—the tip of master, also known as HEAD—and the commit you name: the tip commit of branch theirs.

The git merge command locates the merge base commit on its own. In this case, it's the baseline files, in the initial commit. The git merge command now produces two changesets:

baseline vs HEAD: this is what you changed;
baseline vs theirs: this is what they changed.

Note that this second comparison produces the changeset you used to create the theirs commit and its snapshot. This might seem like wasted effort—why not just give Git the changeset directly?—but it's just how Git itself is built: Git really needs that snapshot, so you have to make it.

What if you already have a repository, and would like to do the work directly there?

In this case, you're in a bit of a bind (in the sense of "problematic situation"). Git finds the merge base on its own. You cannot just tell Git: do a merge, pretending that commit C is the merge base for some arbitrary commit C.¹

One option is to rewrite your entire repository into a structure that allows this. This is generally a bad idea unless you really want to switch over to a new history, discarding all clones as well.

Another is to create a second repostiory, or an independent sub-graph within your repository. This works fine: use git checkout --orphan and git read-tree --empty -u to get a clean slate for the new disconnected branches (don't call the main one master, of course). You can then tie the new merge commit into the original history in your main graph. This is slightly tricky.

A third is to use git replace to insert a parent graph, so that your repository seems to have a new root commit. This is also slightly tricky. It's equivalent to the second method except that it leaves fewer traces of how you did it: whether you leave the replacement commit in place or not, it does not get copied on clone operations, so others trying to figure out how you did what you did, will probably be puzzled.

The last option is the one you described yourself:

... I could use git-merge-file to merge any non-binary files that may exist in all three locations, and then handling all binary conflicts, added, deleted files etc. myself ...

This method also works fine, and you can automate a lot of it with a script; it's just still a bit more painful than having Git do it.

¹Actually, you can do this, using git merge-recursive. This command is not meant to be run by a user, though. There is no documentation telling you how to run it, and the arguments are complicated: some of them are supplied as environment variables! Don't do it this way.

Git Merge a Folder into a Repository using a Specified Ancestor Folder

Answers (2)

Instructions

What if you already have a repository, and would like to do the work directly there?

Related Questions