Dorian McAllister
Dorian McAllister

Reputation: 785

Difference between git filter branch and git subtree?

Was searching throw SO for an answer to this. Came across this older thread which didn't seem to give any answers. Retriggering this thread hoping someone may know!

Can someone tell me the difference b/w git subtree and git filter-branch? I'll use the same example in the original question for this:

git subtree split --prefix=some_subdir -b some_branch

git filter-branch --subdirectory-filter some_subdir some_branch

Upvotes: 15

Views: 4266

Answers (2)

VonC
VonC

Reputation: 1324407

2016: Yes, git subtree (a contrib/ shell) can be used to split repos, as described in "Using Git subtrees for repository separation" by Stu Campbell.

You need to remove the code that you have duplicated in your split folder, though (see also theamk's answer):

git subtree split --prefix=path/to/code -b split
git push ~/shared/ split:master
git rm -r path/to/code
git commit -am "Remove split code."

That differs from git filter-branch (a native Git command) which rewrites the repo history, picking up only those commits that actually affect the content of a specific subdirectory.

Meaning: there is no code to git rm once the filter-branch has been run.
git filter-branch does not duplicate commits like git subtree split does: it deletes ("filters out") everything that does not match a certain criterion (here a subfolder path).
Again, see theamk's answer for updates: there is no duplication when using a new branch: git subtree split --prefix=some_subdir -b some_branch.


Update 2021:

git filter-repo can extract wanted paths and their history (stripping everything else)

 git switch -c some_branch
 git filter-repo --path some_subdir/ --refs some_branch

Upvotes: 7

theamk
theamk

Reputation: 1663

When executed as written, the differences are pretty minor:

  • your "subtree split" command will start from HEAD and put result to some_branch, which must not exist before
  • your "filter-branch" command will start with some_branch and put result back to some_branch, overriding some_branch with the new content.
  • In my tests, "git filter-branch" was ~50x faster (on a very old repo with only a few commits touching the selected path)

In other words, the two snippets below are exactly equivalent, as long as special subtree rejoin commits are not found.

git subtree split --prefix=some_subdir -b some_branch
git checkout some_branch

and

git checkout -b some_branch
git filter-branch --subdirectory-filter some_subdir some_branch

why bother with "git subtree" then, you may ask? For --rejoin and --onto options -- they support a very specific workflow which original author was using.

Upvotes: 1

Related Questions