Reputation: 13085
The data set is 97984 files in 6766 folders with 2,57 GB. A lot of them are binary files.
For me this does not sound so much. The daily data change rate is in the hundreds of KB on maybe 50 files. But I'm scared that subversion will become extremely slow.
It was never fast anyway and the last time at v1.2 the recommendation was splitting it into multiple repositories. No, I don't like this.
Is there way that I can tell Subversion or any other free open source version control to trust the file modified time/file size to detect file changes and not compare all the files? With this and putting the data on a fast modern SSD it should run fast, say, less then 6 seconds for a complete commit (that's 3x more then getting the summary from the Windows Explorer properties dialog).
Upvotes: 5
Views: 747
Reputation: 24491
I think the best way is to try for yourself. Mercurial will work fine, since it doesn't compare the file content if the mtime isn't changed, as you wanted.
Here are the timings (not on ssd):
Data size - 2.3Gb (84000 files in 6000 directories, random textual data)
Checkout time (hg update from the null rev to tip) - 1m5s
status time (after changing 1800 files ~35MB) - 3s
commit time (after the same change) - 11s
If you want to avoid a full tree scan during commit, you could try the inotify extension (use the "tip" version where all known bugs should be fixed).
You need to be aware that cloning such a repo might be painful for your users since they will have to transfer quite a lot of data.
EDIT: I missed the (implicit) fact that you were running it on windows, so inotify won't work (hopefully it will be ported to windows in the future, but that's not the case right now).
EDIT 2: added timings
Upvotes: 3
Reputation: 12416
I've just done a benchmark on my machine to see what this is like:
Data size - 2.3Gb (84000 files in 6000 directories, random textual data)
Checkout time 14m
Changed 500 files (14M of data changes)
Commit time 50seconds
To get an idea of how long it would take to manually compare all those files, I also ran a diff against 2 exports of that data (version1 against version2).
Diff time: 55m
I'm not sure if an ssd would get that commit time down as much as you hope, but I was using a normal single sata disk to get both the 50 seconds and 55minutes comparisons.
To me, these times strongly suggest that the contents of the files are not being checked by svn by default.
This was with svn 1.6.
Upvotes: 3
Reputation: 66753
Is there way that i can tell subversion or any other free open source version control to trust the file modified time/file size to detect file changes and not compare all the files.
I think subversion already does this. Look at this piece of code in libsvn_wc questions.c (rev39196):
if (! force_comparison)
{
svn_filesize_t translated_size;
apr_time_t last_mod_time;
/* We're allowed to use a heuristic to determine whether files may
have changed. The heuristic has these steps:
1. Compare the working file's size
with the size cached in the entries file
2. If they differ, do a full file compare
3. Compare the working file's timestamp
with the timestamp cached in the entries file
4. If they differ, do a full file compare
5. Otherwise, return indicating an unchanged file.
I sampled a few places where this function is called, and the force_comparison
parameter was always FALSE
. I only spent a few minutes looking though.
Upvotes: 3