Reputation: 122
Currently I have the charge of improving some poor development practices. I inherited a production server with 300+ websites running on it all with semi-similar code bases. None of which are perfectly identical. There has been no source code management in place for all of the sites. The development team has been working with old practices of copying a directory and saving a back up instead of working and being able to roll back changes. This also makes it hard to track down who has done what on the sites code base, especially for "quick fixes". The logical conclusion for me is we need to to employ SCM. Git is my choice for this since it is easy to use and get up and running with. It also has a ton of documentation on how to use it and resolve issues that could arise. Only problem is the documentation revolves around single site usage, and not in high capacity production environments.
I am having trouble finding any documentation on how to employ git on a production environment with this many sites. My previous experience with git has been in environments with less than 10 projects on git and each was its own repo, with these 10 projects some had thousands of sites all derived from single code base. My first thought was make every site its own repo so it could be branched and developed on individually with out effecting any other site. I have talked with a few people about the topic and they have said make all 300 sites a single massive repo, then just push and pull that entire repo up and down, this would be nearly 300GB of data being moved around. I realize Git does a incremental push and pull so it would not be 300GB worth of data being pushed and pulled each time; however, this could be thousands of files that would need to be search through for a single git status. This seems like a little overkill and has potential for a lot to go wrong especially with 5-10 of us working on multiple sites under the same massive repo.
Which would be the best route in this case, 1 single massive repo, or hundreds of smaller repos? Or is there another option I am missing?
Upvotes: 2
Views: 103
Reputation: 70913
I think putting all sites into one repository is not the best option for various reasons:
In fact, you are probably approaching a huge refactoring task for all the sites because they seem to use nearly the same code, but I wonder if this is really the case, and if it will help you anyway.
In fact you probably will detect that for example you are using ten or twenty slightly different versions of a database layer or a logger. Any every difference cannot be removed because it is essential for the site that is using it, and it is incompatible with any other site because the method used uses a slightly different signature than anything else. It won't help you to be able to create the single true version of the source code that can be shared by all sites, because it would be a huge amount of work to make that code usable everywhere.
Do one step after the other. First establish version control. One repo per site allows you to gradually create all the repositories that are needed.
After that, you are able to create even more repositories to create a set of libraries that contain the code that really can be shared, or you replace the parts that diverted too much with something entirely different from external sources. Whatever it is that allows you to continue to maintain these sites.
Upvotes: 3
Reputation: 440
I would strongly advise you to use single repo, one for each site/web app. Or at the very least break up the 300+ into smaller cluster of closely related sites into a single repo of maybe 10 or so sites. Or perhaps divide by developer teams... but don't have one massive repo!
Though it's perfectly possible that one could have a huge repo it's really bad practice and depending on how big your repo is, probably a bad idea. The bigger the repo the messier any structural/file changes become, and things like simple renames and merges become a mess to deal with. Also, going "back in time" in your source history becomes virtually impossible if Git need to update thousands of files to do so.
Also, for backup and deployment purposes, you want to have smaller repos. We had a huge .NET solution repo that had upwards of 30 different projects in it and it took half an hour just to clone it. It was bad. We trimmed it down and removed any "non source code" stuff from it (pdfs, images, binaries) and removed projects that should have been on their own. It's much better, faster, and navigating thru the history is a clear breeze. You can also make use of cloud storage like Amazon S3 to deal with static, non-source code files.
We're making use of nuget for dependencies and external libraries. Not sure what framework/language you're using but there's plenty of non-.NET tools to help you manage stuff like this. Hope this helps.
PS: Though with Github it's cheaper to use less repos... perhaps it's be better to look for other git hosts that only charge by number of developers... Bitbucket comes to mind...
Upvotes: 2
Reputation: 116397
You say that your "sites" are very similar, and probably were derived from the same code base, then with high probability they will have a lot of identical files (or files with content that differs very little).
Remember that git is extremely efficient with the way it stores its data, and it has delta compression algorithm which is optimized to store similar chunks in repo only once. With that in mind, you should really try to put all these sites into single git repository and optimize it with git gc
- you may be surprised to see that actual size of git object store could be easily 10 times less than what you expect.
Upvotes: 0