tanager
tanager

Reputation: 189

git merge after renaming of all files

There are other answers regarding handling merge for a rename, but my case is complicated enough that I thought it warranted a separate question.

We have a git project that originally consisted of 20+ repositories. We used a wrapper script to handle many of the standard git operations. Because we are now moving to GitHub, we cannot handle the project this way.

So, we moved all repositories into a single repository, essentially using the method described on saintgimp. This, of course, means that all files have now been renamed, but the SHAs are identical historically.

OK, so now I want to merge branch source into branch target, noting that I made sure the two were in sync right before the final cutover. My first attempt, using git merge <source> caused thousands of conflicts, complaints about files that were changed/deleted on one side or the other, etc.

Then I found a gem on the Advanced Merging page:

If you want to do something like this but not have Git even try to merge changes from the other side in, there is a more draconian option, which is the “ours” merge strategy. This is different from the “ours” recursive merge option.

Ah, this sounds like what I need. OK, I performed the following:

$ git merge -s ours SHA

where SHA is the last commit from the reunification. In other words, I want all history, up to and including SHA, to be considered already merged into target. My hope was that this would be a one-time merge and would fix all future merges.

Now, when I try to merge the first new commit from source, the effect is correct, but I continue to get the following warning:

[user@host src] git merge --no-commit next_unmerged_commit
Auto-merging /path/to/file/changed/foo.c
warning: inexact rename detection was skipped due to too many files.
warning: you may want to set your merge.renamelimit variable to at least 5384 and retry the command.
Automatic merge went well; stopped before committing as requested

And, in fact, if I set renamelimit to 10000, the next merge (call it B) is performed without warning, but at a cost of much slower performance. Once again, a one-time cost is acceptable and I'll pay that cost if my subsequent merges are made normal again.

The next merge, C, where I use the default renamelimit, again gives the warning.

So, finally, my question: How can I convince git that the target branch is in sync with source so that it will stop trying to reach back in history before the reunification? I want to be able to merge without an increased renamelimit, due to the performance degradation.

Upvotes: 3

Views: 1810

Answers (1)

torek
torek

Reputation: 487735

This really isn't a very good answer as it's more about the script you used—or perhaps I should say, the script you didn't use, as your comment says that you used one based on the script to which you linked—but I'll show the rather tangled graph I get in a hypothetical script-conversion of some original repositories below. Note that this particular script leaves all the conversions with a merge base commit of, in essence, commit B, and commit B itself is empty.

Your question says:

now I want to merge branch source into branch target, noting that I made sure the two were in sync right before the final cutover.

As you'll see below, all the new branches are named after the project that they came from—there's no clear way to map source and target onto, e.g., P or Q. But if you were to run:

git checkout P/master
git merge Q/master

after the process illustrated below, the merge base for this git merge would be empty-commit-B and the merge would go smoothly: Git would look at the commits I drew as D and H respectively, trace their ancestries, find commit B as their merge base, and run two git diffs:

git diff <hash-of-B> <hash-of-D>   # what we did on P/master
git diff <hash-of-B> <hash-of-H>   # what they did on H/master

The output of these git diffs would say that every file is created from scratch, and all their names are different: everything in P/master is named P/* and everything in H/master is named Q/*. There would be no name collisions and the merge would complete on its own.

Clearly, that's not what you're doing, then. But what you are doing, and which commit is the merge base, remains mysterious. It looks like you're picking out two tip commits such that the merge base of the two tip commits is a commit that does have files, and those files are not yet renamed from base to tip.

The point of the script you linked is to set things up so that the merge bases of each of the unrelated projects is an empty commit. Probably, the thing to do after that script—or in place of that script, really—is to do one massive octopus merge of all the final commits (NB: this is untested, as is probably obvious):

git checkout P/master                 # just to be somewhere that's not master
git branch -d master                  # discard existing master branch name
git checkout --orphan master          # arrange to create new master
git merge P/master Q/master R/master  # make big octopus merge to marry all projects

The merge base of this octopus merge would again be commit B, and the result would be one merge that brings all the projects in under their new project/* names. The original repositories are now all mostly useless, though if there are new commits in them, you can fetch from them, add a renaming commit, and merge from the renaming commit (this would be easier if the importing script didn't delete the added remotes).

Observations on the workings of the linked script

I've never faced this particular problem, but the approach in the script seems like a not-unreasonable starting point. I'd probably do it a bit differently, not bothering with an empty merge base and using git read-tree and git commit-tree to build and create the end octopus merge. The main key is to add a rename commit at the end of each incoming project branch (P/*, Q/*, etc) in the sketch below.

The script seems to work this way. It has as inputs projects P, Q, R (URLs whose last component is treated a project name).

  1. Make empty repo.
  2. Make two iniial commits:

    A--B   <-- master
    

Commit A has one file, commit B has no files (why not just commit the empty tree as B? but never mind).

  1. Loop, for all three projects. Here I have expanded the loop to view what's happening.

  2. (loop iteration 1) git remote add P <url> and git fetch P (with --tags!?). We're going to assume here that P has master and dev.

    A--B   <-- master
    
    P1-P2-...-Pm   <-- origin/P/master
           \
            Pd   <-- origin/P/dev
    
  3. Use git ls-remote --heads to find names for commits in P, i.e., the same set of names we have in refs/remotes/P/*. (Assumes the remote hsa not changed during fetch -- unwise but probably OK.)

    Loop over these names. Result again expanded in line for illustration...

  4. Run git checkout -b P/master master. Effect:

    A--B   <-- master, P/master
    
    P1-P2-...-Pm   <-- origin/P/master
           \
            Pd   <-- origin/P/dev
    
  5. Run git reset --hard for no apparent reason: no effect. Perhaps this might have some effect on some later step.

  6. Run git clean -d --force for no apparent reason: no effect.

  7. Run git merge --allow-unrelated-histories --no-commit remotes/P/master" (does merge, but does not commit yet) and then rungit commit -m ...`. Effect:

    A--B   <-- master
        \
         \-------C   <-- P/master
                /
    P1-P2-...-Pm   <-- origin/P/master
           \
            Pd   <-- origin/P/dev
    
  8. Maybe rename files, with somewhat squirrelly code (lines 160-180): if project P has one top level directory named P, do nothing, otherwise create directory named P (with no check to see if this fails) and then in effect:

    git mv all-but-P P/
    git commit -m "[Project] Move ${sub_project} files into sub directory"
    

    giving:

    A--B   <-- master
        \
         \-------C--D   <-- P/master
                /
    P1-P2-...-Pm   <-- origin/P/master
           \
            Pd   <-- origin/P/dev
    

    Note that the git mv is given -k so that it does nothing if one of the git mv operations would have failed. However, except for subdirectory P and .git itself, all files in the top level of the work-tree should be in the index and the git mv should succeed unless one of them is named P (in which case, yikes!).

    I assume here that we did the mv, otherwise commit D does not exist.

  9. Repeat loop (see step 5) for dev. Run git checkout -b P/dev master:

    A--B   <-- master, P/dev
        \
         \-------C--D   <-- P/master
                /
    P1-P2-...-Pm   <-- origin/P/master
           \
            Pd   <-- origin/P/dev
    
  10. Presumably-ineffectual git reset and git clean again as in steps 7 and 8. (This might do something if the git mv in step 10 went really badly?) Do a funky two step merge as in step 9, giving:

    A--B   <-- master
       |\
       | \-------C--D   <-- P/master
                /
    P1-P2-...-Pm   <-- origin/P/master
           \
         \  Pd   <-- origin/P/dev
          \   \
           \---E   <-- P/dev
    

    where the line down from B connects to the one up from E. The graph has gotten rather out of hand at this point.

  11. Rename and commit as in step 10. I assume here that the project isn't already in a subdirectory, in both master, as already assumed, and dev.

    A--B   <-- master
       |\
       | \-------C--D   <-- P/master
                /
    P1-P2-...-Pm   <-- origin/P/master
           \
         \  Pd   <-- origin/P/dev
          \   \
           \---E--F   <-- P/dev
    
  12. Really ugly attempt to rename tags, at lines 190-207. This should have been done at fetch time, using a clever refspec. Whoever wrote this probably was not aware of annotated vs lightweight tags. It is not clear to me whether this works correctly and I did not look closely. Let's just assume no tags for now.

  13. Remove remote P. This removes the origin/P/* names too, but of course the commits stick around as they're retained by the new P/* branches:

    A--B   <-- master
       |\
       | \-------C--D   <-- P/master
                /
    P1-P2-...-Pm
           \
         \  Pd
          \   \
           \---E--F   <-- P/dev
    
  14. Repeat outer loop (step 3) with remote Q. We'll add Q and fetch (again with --tags, not a good plan as noted in step 14, but let's just assume no tags). So now we get another disjoint subgraph with origin/Q/* names. For simplicity let's just assume that only origin/Q/master exists this time:

    A--B   <-- master
       |\
       | \-------C--D   <-- P/master
                /
    P1-P2-...-Pm
           \
         \  Pd
          \   \
           \---E--F   <-- P/dev
    
    Q1-Q2-...-Qm   <-- origin/Q/master
    
  15. Run git checkout -b Q/master master:

    A--B   <-- master, Q/master
       |\
       | \-------C--D   <-- P/master
                /
    P1-P2-...-Pm
           \
         \  Pd
          \   \
           \---E--F   <-- P/dev
    
    Q1-Q2-...-Qm   <-- origin/Q/master
    
  16. Run the (probably ineffectual and still mysterious) git reset --hard and git clean steps.

  17. Use the funky two step merge with --allow-unrelated-histories to create new commit G like this:

         ---------------G   <-- Q/master
        /               |
    A--B   <-- master   | (down to Qm)
       |\
       | \-------C--D   <-- P/master
                /
    P1-P2-...-Pm
           \
         \  Pd
          \   \
           \---E--F   <-- P/dev
    
                 / (up to G)
                /
    Q1-Q2-...-Qm   <-- origin/Q/master
    
  18. Again, optional: rename all files in G to live in Q/ and commit. Again let's assume this does happen:

         ---------------G--H   <-- Q/master
        /               |
    A--B   <-- master   | (down to Qm)
       |\
       | \-------C--D   <-- P/master
                /
    P1-P2-...-Pm
           \
         \  Pd
          \   \
           \---E--F   <-- P/dev
    
                 / (up to G)
                /
    Q1-Q2-...-Qm   <-- origin/Q/master
    
  19. Ugly attempt to rename tags; we'll ignore this.

  20. Remove remote Q and origin/Q/* names. (No need to draw this.)

  21. Repeat outer loop for repository R. Assuming it has only its own master, we'll get a tangled graph like this:

         --------------------I--J   <-- R/master
        /                    | (down to Rm)
       /
       | ---------------G--H   <-- Q/master
       |/               |
    A--B   <-- master   | (down to Qm)
       |\
       | \-------C--D   <-- P/master
                /
    P1-P2-...-Pm
           \
         \  Pd
          \   \
           \---E--F   <-- P/dev
    
                 / (up to G)
                /
    Q1-Q2-...-Qm
                    / (up to I)
                   /
    R1-R2-...----Rm
    

(end of analysis)

Upvotes: 2

Related Questions