Patrick
Patrick

Reputation: 4870

SVN to Git migration: Only import certain branches and history

My team is preparing to migrate to Git and we'd like to start with a small repository. The initial Git repository created by git-svn is about 10GB large due to binary files and hundreds of version branches.

Cleaning big files out is easy, the tricky part seems to be the number of branches.

For the git migration, we'd like to start at a certain point in time (X) with only certain (the newest) branches. We do not have a "trunk" - but instead different version branches that are maintained over a longer time period:

 ---- Version 1 ------------------------
     \---------- Version 2--------------
                \--------- Version 3----

I easily found out how to clean big blobs from the history (BFG, git filter-branch).

My Question:

How can we remove all branches except a few specific ones from history so that we only have, say, branch "version 3" in the fresh repository? Ideally, we'd like the history to begin at the start commit where this branch was created:

 --------- Version 3----

Is there a way to do this with git filter-branch or another possibility?

Upvotes: 2

Views: 3426

Answers (2)

Oscar Vasquez
Oscar Vasquez

Reputation: 391

I know this is several years late, but in case anyone is looking for an answer to select just a few branches that doesn't involve cloning the whole thing:

Init the git repo with the SVN URL in folder tmp

git svn init -T <main_branch_name> <repo_url> tmp

Update the ‘.git/config’ file to clone only specific branches. In this case, we’re going to clone only branches that match the pattern feature*

[svn-remote "svn"]
   noMetadata = 1
   url = <repo_url>
   fetch = trunk:refs/remotes/origin/trunk
   branches = branches/feature*:refs/remotes/origin/*  ## Added line

Now you can get files from the SVN repository

git svn fetch -r $NUMBER:HEAD

More information:

Upvotes: 4

poke
poke

Reputation: 388403

Import the whole repository into Git, and then throw away the branches you are not interested in.

The throw away part would be the interesting one :D How can we throw them away and eradicate them from the repo history?

Well, the way Git works, branches are just pointers to commits within the history of the repository. Branches exist because those pointers exist to point to them. If you remove the pointers, the branches just disappear. And if nothing else points to those commits, the commits are essentially removed from the repository.

Now beside branches, there is another prominent thing that usually points at commits and keeps the around: Newer commits depending on them. Git’s history is a large acyclic tree in which each commit has parent commits it points to. By that, the old commits stay around even when no branch is explicitely pointing at them; and by that, the whole history works.

So if you want to get rid of a whole line of commits (a separate branch), and those commits were not merged into another branch at some point, then all you need to do is remove the branch from the repository. Then, nothing will point to the line of commits and they will be removed when you garbage-collect the repository:

git branch -D Version_1
git branch -D Version_2
git gc --prune=now

This will force-delete branches Version_1 and Version_2 from the repository, and afterwards run the garbage collection that removes every object from the repository which has no pointer pointing to it.

Afterwards, you have the full history left for Version_3, including those parts from the other two versions that are part of its history. If you want to remove that as well, you can apply the method explained in this question to remove the old history before the branch point of version 3.

Upvotes: 3

Related Questions