Reputation: 100190

Merging two branches A and B where A contains files that were deleted from B

Ok, so the basic idea is that there are some files that I want to ensure get reinstated or "undeleted".

I am not exactly sure what happened, but I have git branches A and B and I am 99% that B was branched off of A. After I created B, I deleted some files from B and then made a lot of changes. I now want to merge with A, but I am afraid that what will happen is that the deleted files from B will be permanently dropped when I merge the branches.

Has anyone handled this type of situation before? The first thing I will try is to create a branch C from A and then try merging C with B, but my guess is that doesn't solve the problem.

Upvotes: 1

Answers (1)

torek

Reputation: 489083

I am not exactly sure what happened, but I have git branches A and B and I am 99% that B was branched off of A.

The merge process does not care how you got to where you are now,¹ only where you actually are. More specifically, it needs three items:

Your current commit (HEAD), on whatever branch it's on, if any. Let's call this "our" commit and/or the "local" commit. When we need it (soonish), Git has --ours.
Your "other" commit, from the argument to git merge: e.g., git merge foo will resolve the name foo to find its specific commit-ID. Most parts of Git that give it a name, call this "their" commit (although one part calls it the "remote" commit, which is particularly confusing since it has nothing to do with Git's fetch and push remotes). Let's just call this the "other" commit, but when we need it (soonish) Git uses --theirs.

This --ours and --theirs stuff holds even if both commits are made by you (another reason to call them "local" and "other", I think).
The merge base.² This is the most recent³ commit that is on both branches, i.e., the point from which the two branches diverge.

It's this third thing—the merge base—that you are a bit vague about. You can simply instruct Git to tell you which commit that is, using the command git merge-base.

After I created B, I deleted some files from B and then made a lot of changes. I now want to merge with A, but I am afraid that what will happen is that the deleted files from B will be permanently dropped when I merge the branches.

They probably will—except that nothing is permanent in Git. Or we could say that everything is permanent which means nothing is permanent, which is getting awfully philosophical, but is basically like saying that if everything is 100% critical, then nothing really matters, or at least, nothing matters more than anything else. :-)

Here are some things to consider, and then some tools to consider for providing these things:

Git performs a merge by comparing the merge base to the two (local and other) commits. This tells it, in effect, what has happened on each "side" of the branching that happened since the common base commit.

The merge is done by taking one copy of each change made on each side. Git declares a conflict whenever it can't do that.

So, suppose file misc.txt is deleted in the base-to-local change-set, and has a line added in the base-to-other change-set. How do we combine "delete entire file" with "add this line"? Git's answer is: "I don't, I just declare a merge conflict on that file."

On the other hand, suppose file samp.dat is deleted in base-to-local and has no change made in base-to-other. How do we combine these? The answer is clear: delete the file.

There are numerous other conflict cases, but the above shows the ones you are particularly worried about, here.
Do you want to merge A into B, or B into A? What, precisely, is the difference?

For instance, above, we just said what happens if base-to-local deletes samp.dat while base-to-other leaves it unmodified: Git deletes samp.dat in the merge result. If base-to-other deletes the file, and base-to-local leaves it unmodified, Git also deletes the file, as the change is still "one side says delete, one side says do nothing". It doesn't matter which side says it, just that it's exactly one side. (Or, if both sides say to delete the file, there's no problem there either.)

In fact, the merge result is always the same: one set of changes says do X, the other says do Y; if Git can combine them, the result is that combination, otherwise the result is "conflict", with whatever Git can do best to record both sets of changes. But in all cases the result is symmetric: combine(X, Y) ≡ combine(Y, X). (If it doesn't show up in all fonts there is a three-line "=" sign, "equivalent to" symbol, between the two "combine"s.)

The only difference, then, is that Git will make the new merge commit on whatever branch you are on when you start the process—and of course, whatever branch you are on, you need to name the other branch (or its tip commit) in the git merge command. It's the work-tree (combined merge result) that is the same, not the commit itself.
Finally, suppose that you have some commit—maybe a merge commit, or maybe just a regular ordinary commit—that removes file misc.txt. Now, remembering that Git is a version control system and saves everything forever,⁴ suppose that we tell Git to extract misc.txt from the commit just before this commit, and make a new commit. The new commit has, as its work-tree contents, everything that was in the one that removed misc.txt plus the version of misc.txt resurrected from the earlier commit.

If I direct you to look at the new commit instead of the old one, will you care how misc.txt got resurrected? (There are cases where you might care nothing, cases where you might care a little, and a few cases where you might care a lot, so this is a question to ponder for a while, or come back to eventually. But it gets at the philosophical question of what it means for anything to be permanent as long as everything is. The old commit permanently removed misc.txt and the new one permanently resurrected it, so what does "permanent" even mean?⁵)
Since the merge is itself a new commit, you can repeat the merge, or at least any automatic, Git-performed portion of it, any time later. The old commits are permanent and you can simply get on an anonymous branch from one of the old commits, and merge the other old commit, to repeat Git's work. If the old commit stopped with merge conflicts, the new one will stop with the same conflicts (as long as you use the same merge strategy and options, anyway).
In the case of a conflict, you can't repeat directly any hand resolution you chose ... but you don't have to, as the final resolution is recorded in the merge commit. If you got it wrong then, you can redo the merge, grab any final resolutions that are correct, and re-hand-resolve the remaining conflicts. This includes removing and/or resurrecting files.

Anyway, with those out of the way, let's look at these tools for merging:

git diff. Git is going to combine the changes from the merge base commit, to the two tip commits. You can look at these changes, before, during, and/or after the merge. There is a nice shortcut pre-merge, that stops working post-merge (because the current branch name now points to the new merge commit):
```
git diff A...B
git diff B...A
```
(note the three dots). The first command compares the merge base to commit B, and the second compares the merge base to commit A. Add --name-status (and optionally --diff-filter=D selects only files whose status is 'D'eleted) to see which files are deleted on which side(s), and see if those files are modified (--diff-filter=M, for 'M'odified) on the other side; this will tell you if git merge will delete them.
Add --no-commit to git merge and it will stop after merging, even if there are no conflicts for you to resolve:
```
Automatic merge went well; stopped before committing as requested
```
You may now make by-hand changes to the tree. These are, of course, not automatic, so a later re-merge will not do them. Is this a good idea? That depends on what you want from your permanent commits!
At some point during resolving (either from --no-commit or due to a merge conflict), you can extract specific files from either side of the merge.

Normally, one does this during a merge conflict, after inspecting everything carefully. We might, for instance, decide that the merge should take our version of the file from our local commit, completely ignoring their changes from the other commit:
```
git checkout --ours -- file.ext
```
Or we might decide that the other commit is the right one:
```
git checkout --theirs -- file.ext
```
These work only if the file is conflicted. If Git resolved the merge on its own, we may need a different way to name the file. We can simply specify the commit that has the version we want:
```
git checkout a012345 -- file.ext
```
for instance. But typing in SHA-1 hashes is ugly and error prone. Fortunately, there are two names we can use: HEAD is the local commit (ours) and MERGE_HEAD is the other commit (theirs), so:
```
git checkout HEAD -- file.ext
git checkout MERGE_HEAD -- restored1.py restored2.tex
```
(the -- is not usually required, just usually a good idea: it is a good habit that protects you if the file you need is the oddly-named --theirs, for instance).

Finally, you can of course just do the merge, letting it make a commit, then extract any files you want restored—probably from the tip of the other branch:

git checkout A -- restore.me restore.me.too

and make another commit, or use git commit --amend to replace the merge commit with a new merge commit that has the restored files.⁶

¹It probably should care, or at least, should be able to, in order to detect file renames better. That's another discussion entirely though.

²Technically, there could be more than one merge base. If so, git merge-base --all will find all of them; and if so, it takes more work to predict what will happen. For most merges, though, there is only one.

³Git does not (and must not) care about the time stamps on the commits, but rather on the shape of the graph. Hence "most recent" is not quite right, but it does capture the idea that if there are many commits on both branches—which is almost always true—we want the one "closest to" the local and other commits, and the further back we go behind the "first" one, the older those commits are in terms of graph-history, even if the time stamps on them are bizarre.

⁴In fact, some things do get discarded, but only once they are unreferenced, which is a topic for a separate discussion.

⁵There is, in fact, a strictly correct answer to this question, at least for Git. That answer ties into the identity of Git's hashes, which is connected to how you can use digital signatures to "sign off" on particular commits. The actual permanence of anything is its hash ID: as long as the hash ID remains valid, the commit still exists, keeping the "permanent" permanent ... unless, that is, you manage to break SHA-1 hashing.

⁶Note that --amend (and, for that matter, rebase, especially with -i) seem to violate the "save everything forever" rule. But in fact, they don't: they do not replace anything, they merely add new commits, and hide the old ones from git log. The hidden ones do eventually become entirely unreferenced, and hence expire as in footnote 4, but you have plenty of time—normally at least a month—to get them back. And, any commit still view-able in regular git log --all never expires—it's just the ones shoved aside by --amend or rebase, that, once out of view, eventually expire.

Upvotes: 3

Merging two branches A and B where A contains files that were deleted from B

Answers (1)

Related Questions