Travis Jensen
Travis Jensen

Reputation: 5420

git filter-tree and modifying submodules

I'm moving the contents of a git repository into another repository, and, for all of the regular commits, everything is fine. I have run into problems with submodules, though.

Setting up, we start with two repos. We'll call them "docs" and "operations", and we want to move the contents of "docs" into a subdirectory of "operations", like this:

docs/
  file1.txt
  dir1/
    file2.txt
  other-docs/  <- This is a git submodule

operations/
  bin/
    do-things
  docs/
    important.txt

And we want the final version to look like this, where the "docs" repo ends up under "docs/legacy" in the operations repo:

operations/
  bin/
    do-things
  docs/
    important.txt
    legacy/
      file1.txt
      dir1/
        file2.txt
      other-docs/  <- This is a git submodule

I've got a script that uses a combination of git filter-branch --tree-filter and git rebase (to rebase the new content onto the existing content and handle conflicts like .gitignore files) to perform the actual migration, but, after running the migration, I end up with:

operations/
  bin/
    do-things
  docs/
    important.txt
    legacy/
      file1.txt
      dir1/
        file2.txt
  other-docs/  <- This is a git submodule

Where the other-docs submodule is still at the root of the new repo.

I understand why this is happening. As I go through the commits to move things, there isn't an actual file for the submodule, so, in the "everything in this directory is committed exactly as you leave it" model of git filter-branch --tree-filter, there is nothing to "leave" for a submodule.

So, first question: is there some aspect of using git filter-branch where I can account for this? One place I was wondering is if I can add --commit-filter and mess with things there, but I'm not completely clear on what the invariants around a commit filter are.

If not there, is there somewhere else I can do this. As far as I can tell, I will have to modify the existing commits for submodules, basically "removing" the submodule in the old, incorrect location and "add" the submodule in the new, correct location. I suppose I could script through an interactive rebase operation, finding those commits and amending them. It just sounds like a lot of work if there is a better way.

Any ideas appreciated.

Upvotes: 2

Views: 1387

Answers (2)

jthill
jthill

Reputation: 60295

Tree filters are easy, but they're slow and as you've discovered they're oblivious. Much better to only check out the content you need to alter and use git read-tree for the rest.

git filter-branch --index-filter='

        # load up the docs-repo commit we're importing under docs/legacy/
        git read-tree --prefix=docs/legacy/ $(imported-commit-for $GIT_COMMIT):

        # hoist any imported submodule configs
        git checkout .gitmodules
        git checkout docs/legacy/.gitmodules 2>&- &&
        sed -n "s,path ,path docs/legacy/,
             s,^,git config -f .gitmodules ,e"  &&
        git rm docs/legacy/.gitmodules &&
        git add .gitmodules

        # any other needed content updates here
'

Upvotes: 3

torek
torek

Reputation: 488213

I understand why this is happening. As I go through the commits to move things, there isn't an actual file for the submodule, so, in the "everything in this directory is committed exactly as you leave it" model of git filter-branch --tree-filter, there is nothing to "leave" for a submodule.

That's exactly the problem. It seems kind of nasty.

So, first question: is there some aspect of using git filter-branch where I can account for this? One place I was wondering is if I can add --commit-filter and mess with things there, but I'm not completely clear on what the invariants around a commit filter are.

You can. It's not very pretty.

A better place to do this is in --index-filter. As the documentation notes, the filters are run in the listed order, so the index filter runs after the tree filter. It can make any manipulations you like to the index that the tree filter's surrounding code wrote. (The tree filter uses git update-index --add-remove to update index entries, auto-add-ing files if needed, based on the tree your filter left behind.)

You can also do everything directly in the index filter, which is much faster, since this doesn't require mucking about with actual file system operations (mkdirs, creating files, and so on). Index filters are just difficult to write in general, though—the git update-index command can construct a whole new index or just update parts of an existing one, but you have to construct one, or some changes, by first reading out the old index (using git ls-files --stage, perhaps) and manipulating the resulting text.

But this would let you do what you want.

Upvotes: 1

Related Questions