Reputation: 4870
My team is preparing to migrate to Git and we'd like to start with a small repository. The initial Git repository created by git-svn is about 10GB large due to binary files and hundreds of version branches.
Cleaning big files out is easy, the tricky part seems to be the number of branches.
For the git migration, we'd like to start at a certain point in time (X) with only certain (the newest) branches. We do not have a "trunk" - but instead different version branches that are maintained over a longer time period:
---- Version 1 ------------------------
\---------- Version 2--------------
\--------- Version 3----
I easily found out how to clean big blobs from the history (BFG, git filter-branch).
My Question:
How can we remove all branches except a few specific ones from history so that we only have, say, branch "version 3" in the fresh repository? Ideally, we'd like the history to begin at the start commit where this branch was created:
--------- Version 3----
Is there a way to do this with git filter-branch
or another possibility?
Upvotes: 2
Views: 3426
Reputation: 391
I know this is several years late, but in case anyone is looking for an answer to select just a few branches that doesn't involve cloning the whole thing:
Init the git repo with the SVN URL in folder tmp
git svn init -T <main_branch_name> <repo_url> tmp
Update the ‘.git/config’ file to clone only specific branches.
In this case, we’re going to clone only branches that match the pattern feature*
[svn-remote "svn"]
noMetadata = 1
url = <repo_url>
fetch = trunk:refs/remotes/origin/trunk
branches = branches/feature*:refs/remotes/origin/* ## Added line
Now you can get files from the SVN repository
git svn fetch -r $NUMBER:HEAD
More information:
Upvotes: 4
Reputation: 388403
Import the whole repository into Git, and then throw away the branches you are not interested in.
The throw away part would be the interesting one :D How can we throw them away and eradicate them from the repo history?
Well, the way Git works, branches are just pointers to commits within the history of the repository. Branches exist because those pointers exist to point to them. If you remove the pointers, the branches just disappear. And if nothing else points to those commits, the commits are essentially removed from the repository.
Now beside branches, there is another prominent thing that usually points at commits and keeps the around: Newer commits depending on them. Git’s history is a large acyclic tree in which each commit has parent commits it points to. By that, the old commits stay around even when no branch is explicitely pointing at them; and by that, the whole history works.
So if you want to get rid of a whole line of commits (a separate branch), and those commits were not merged into another branch at some point, then all you need to do is remove the branch from the repository. Then, nothing will point to the line of commits and they will be removed when you garbage-collect the repository:
git branch -D Version_1
git branch -D Version_2
git gc --prune=now
This will force-delete branches Version_1
and Version_2
from the repository, and afterwards run the garbage collection that removes every object from the repository which has no pointer pointing to it.
Afterwards, you have the full history left for Version_3
, including those parts from the other two versions that are part of its history. If you want to remove that as well, you can apply the method explained in this question to remove the old history before the branch point of version 3.
Upvotes: 3