Reputation: 367
We have a very large legacy CVS repo (66GiB) over a decade and increasing. Now we have some sub-contract companies, which need to work on some modules and branches.
We need to create some branches for them and send them that branches. Also we need to merge their changes into our main branches from time to time.
Our concern is:
we cannot give them the whole repo absolutely, mostly the concern is security.
we need to send them some history info, not just the "HEAD" version of code.
we are still doing some development work, so we need to send them changeset from time to time.
Is GIT and Mercurial a good choice to migrate from CVS? Can GIT/Mercurial satisfy our needs?
EDIT: I think we actually need a centralized revision control with multi-site feature, with the ability to create off-site repo based on part of central repo. And can be easily merge between sites.
Upvotes: 4
Views: 396
Reputation: 4772
I'll let the other posters answer the subtree and subhistory questions, because I'm not as familiar with that. However, I can tell you a few things about the size of the repo. First, your git repo will very likely be much smaller than your CVS (I would guess it will be between a tenth and a half of the current 66GiB).
Second, yes, if you put your entire CVS repo into a single git repo, then your internal developers will have a copy of the entire repo on their individual PCs. The git repo that I work with on a daily basis is 12GB, and it doesn't cause any real problems. Assuming that your repo is large because your working copy is large, it actually saves significant time when you want to move between branches because you're not fetching so many files over the network. For us, the 12GB git repo isn't that big of a deal, because my current working copy (with build objects for most targets) is an additional 37GB on top of the git repo itself. On a repository of this size, git's commands work much faster than subversion's did.
So definitely read what everyone else says about the subtrees and modules, etc. but rest assured that you can probably just import the whole thing if you have to.
Upvotes: 0
Reputation: 7755
We have a very large legacy CVS repo (66GiB) over a decade and increasing. Now we have some sub-contract companies, which need to work on some modules and branches.
We need to create some branches for them and send them that branches. Also we need to merge their changes into our main branches from time to time.
It sounds like you're wanting to transition only for the subcontractors, and not for everyone else. I strongly suggest you don't do this. Either convert everyone or convert no-one. Running a mixed system is a huge pain, especially when it comes to taking the changes from the people on the DVCS.
Our concern is:
- we cannot give them the whole repo absolutely, mostly the concern is security.
Is it that you have multiple modules in your CVS repo, but can't give them all modules, or you want to limit the history they can access?
DVCSs work much better when modules are stored as separate repositories, not multiple modules in one repository*. There are many reasons for this, but mainly it's so that changes in different modules don't cause unnecessary merges.
(* as do CVCSs, but it's normally such a pain to create a new module that people only do it once. I suspect you wouldn't have 66GB if it was split.)
So if you do convert you want to separate the modules. This would then allow you to share some modules and not others. I know Mercurial is able to create a repo from a path set within a multi module repo during conversion. I expect Git has similar capabilities.
- we need to send them some history info, not just the "HEAD" version of code.
This almost dictates a DVCS. It's a defining attribute.
- we are still doing some development work, so we need to send them changeset from time to time.
...and this is why you should be using the same VC tool as them. Otherwise you'll spend all your time converting changesets between systems.
Is GIT and Mercurial a good choice to migrate from CVS? Can GIT/Mercurial satisfy our needs?
Yes & Yes, but it's not a push button transition. It needs planning, commitment and education.
EDIT: I think we actually need a centralized revision control with multi-site feature, with the ability to create off-site repo based on part of central repo. And can be easily merge between sites.
A centralised, but distributed, version control system. Got ya!
Final point, don't confuse centralised/distributed development practise with the centralised/distributed tools. It's perfectly reasonable to work in a centralised development model with a distributed VCS.
Upvotes: 0
Reputation: 116407
66 GB sounds like a lot. However, CVS is known to not store data very efficiently.
Git will certainly work for you, but you will have to split your project into few smaller git repositories. For most projects, it is not very difficult to split functionality into few self-contained subprojects (often they are subdirectories).
Typically you want to limit size of any given git repository to be less than 1-2 GB on average, and certainly it should not exceed 5-10 GB. However, keep in mind that git is exceptionally good at compressing its metadata (as long as you run git gc
once in a while).
Now, once you have split your project into few subprojects ('few' is relative term - Android has 300+), you need to figure out a way how to "glue" them together into coherent directory structure once again.
For this, there are 2 common approaches:
repo
tool developed by Android project. It involves creating small git repository containg just one XML file (called manifest) which tracks where your subprojects are checked out into and how they are glued together. This works really well on Linux and Mac, but unfortunately does not support Windows (repo
requires symbolic link support by OS).git submodule
's. Create one git repository without any real files, and add all of your original subprojects into this repository as submodules. In a sense, this super git repo plays essentially the same role as Android repo manifest. Advantage of this approach is that it is supported by any OS, including Windows.Now, if you want to share only small portions of your gigantic project, you can do so by sharing any submodule/subproject directly to your partners as standard git repository.
In fact, to make it more convenient, I would highly recommend to install Gerrit - git server implementation in Java, which also happens to be extremely powerful code review engine (also used by Android project). Gerrit's code review function is fully optional (you don't have to use it if you don't want to), but you will certainly enjoy Gerrit's unified user authentication, ssh key management and ability to control access permissions per git repository. This makes it very convenient to share to 3rd parties - you just give them access to small parts using Gerrit, and you're done.
Upvotes: 3
Reputation: 129744
Choose git. prefer submodules over trees if you can as you can better control dependencies between projects and their respective subprojects.
Upvotes: 0
Reputation: 994471
With Git, you can use the git subtree
command to "snip" out subdirectories that you can give to your subcontractors, and then easily reintegrate their changes into your mainline. You can also give them updates periodically if you need to. The git subtree
command was original an add-on but has been rolled in to the contrib
directory of the official Git distribution.
It is possible to limit the amount of history you include in a repository you give to an external user.
I expect your largest concern, though, will be around the move to a DVCS with such a large starting repo. Git will compress your repo so it's unlikely to be 66 GB when you're done, but it will still be rather unwieldy (probably on the order of 10 GB, depending on what you've got stored in there). If you don't consider that a problem, then go for it.
I have limited my answers to Git because I'm more familiar with Git than Mercurial.
Upvotes: 5