Reputation: 3562
I have a Subversion server with a few different projects in the standard layout like so:
ProjectA/
trunk/
branches/
tags/
ProjectB/
trunk/
FolderOfBinaries/
SourceFolderA/
SourceFolderB/
SourceFolderC/
branches/
tags/
v1.0/
v1.1/
v2.0/
ProjectC/
trunk/
branches/
tags/
ProjectB is going to be be migrated to Git, but not with a standard clone. I want to split the project into two Git repositories - one for the folder full of large binaries that change relatively often and another repository for everything else. I did a clone of the repository in full and it's a few GBs, but the binaries folder is probably 90% of that, and running git gc
takes a long time. I'd rather have a small fast repository and then add the binaries folder as a submodule if the developer requires it.
I've found two potential options so far. First, I could use git branch-filter
to try and remove the folder of binaries from the history as shown in the Git Book. Second, I could use svndumpfilter
to split the current Subversion repository into two and then git svn clone
each separately.
My question is though, what will happen to all the history, and particularly the branches and tags? I'd still like to know what the folder of binaries looked like at every tag in the project, even though the binaries may not have changed between two tags. is that possible?
Edit: The folder of binaries is not full of build artefacts (*.class, *.o, *.dll etc) so I can't just strip it out and make them external. It's full of binaries that are output from a third-party program that need to be versioned (think OpenOffice documents, Photoshop files etc.).
Upvotes: 4
Views: 2517
Reputation: 3562
Well, I've managed to do this, but it wasn't all that straightforward. There may be a better way but not one that I could work out. I did the following:
Create a dump of the current repository: svnadmin dump /opt/repo > full_dump
Filter the dump to remove the binaries folder: svndumpfilter exclude *folderofbinaries* --pattern --renumber-revs --drop-empty-revs < full_dump > filtered_dump
. I needed to make folderofbinaries
a pattern because way back in the past someone had actually checked a binary directly into a tag (!) so the next step was failing due to a missing folder.
Create a local SVN repository with the filtered dump:
mkdir repo-filtered;
svnadmin create repo-filtered;
svnadmin load repo-filtered < filtered_dump
Clone both the full and filtered repo into different folders (I used svn2git). The filtered repo will not contain any of the binaries. If, in the full repo, only the binaries folder changed between tags A and B, in the new filtered Git repo the two tags will point to the same commit, which is exactly what I wanted.
In the full Git repo, use Git to strip out everything except the binaries folder.
The reason that I had to use Git to isolate the binaries folder was because I couldn't work out how to maintain the tags just using svndumpfilter
(especially given I had a binary committed directly into a tag). After the conversion I get the same behaviour as in the filtered repo - if no binaries changed between two tags then they both point to the same commit.
The commands for the final step were:
git checkout master
git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter folderofbinaries -- --all
git reset --hard
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
git reflog expire --expire=now --all
git gc --prune=now
which I got from this question.
Now I have an 80MB sources repository and a 1.5GB binaries repository from my original 4.4GB SVN dump file! I can recreate the exact state of the original SVN repo by adding the binaries folder as a Git submodule of the sources repo and checking out the same tag on each (which is why I needed to preserve all the tag info) whilst not having one mammoth Git repo that's slow to work with.
Upvotes: 1
Reputation: 107030
Take a look at svndumpfilter. It's pretty simple to use. You do a Subversion repository dump, and then use the filter to either say what you want or what you don't want.
Do a dump of your current repository, then run svndumpfilter twice -- once for each Git repository. You can chain them. Just run it twice for each Git repository.
$ svndumpfilter include ProjectB < svn_repo_dump | svndumpfilter exclude ProjectB/trunk/folderofbinaries > svn_repos_no_binaries
I do want to mention one thing: Don't store built binary objects in your repository. In Subversion, they're impossible to remove without a dump and filter, and even in version control systems with the ability to obliterate revisions, doing so takes a lot of time and effort. It's a big maintenance headache.
And for what? Storing binaries in a version control system doesn't really help. You can't diff binaries, the history doesn't help, and they are hard for non-developers to access.
Instead, use a release repository, and store your binaries there. You can use a Maven repository like Artifactory or Nexus even if you don't use Maven or even use Java.
Upvotes: 1
Reputation: 5255
I would recommend svndumpfilter
to first split ProjectB into two repositories. Afterwards you can use git svn clone
to convert the new SVN repositories into GIT repositories.
When the --include
patterns of svndumpfilter
consider the trunk, branches, and tags folders, the full history of the split repositories will be preserved. So you can take a look at all the history of FolderOfBinaries
in the new binaries repository.
When you create the GIT repositories using git svn clone
, the content of the branches
folder will be converted to GIT branches and the content of the tags
folder will be converted into GIT tags.
Upvotes: 1