Sync services like Dropbox, theory behind file indexing?

Question

I have realised that by using the Amazon S3 service directly, I can save myself a lot of money. Instead of buying a client like GoodSync or Jungle Disk I thought it would be interesting to create my own Windows syncing application, which would sync my files to S3.

I have discovered that I can use FileSystemWatcher to monitor for changes to files and directories, but I am looking for the theory behind how other services like Dropbox index their files. Things like comparing the file size of a file with the size recorded in an index somewhere on the client PC, then using this information to determine whether to sync or not.

I am using C# and references to different libraries or code samples I could use would be helpful, but I am mainly looking for the best way to index files and for someone to point me in the right direction.

Thanks

cbmeeks · Accepted Answer

I've went down this path myself. In fact, now that Mozy dropped their unlimited plan and Carbonite chooses to NOT backup certain files...like 3GP files and *.dat files unless you routinely go in and manually add them, I am very disgruntled with online backups.

But your question was on syncing. Dropbox does it the best. But it's expensive. But I'm not sure S3 would be any cheaper.

Anyway, you will have a lot of hurdles. In my experiences, the problems I ran into are:

1) Propagating deletes

2) FileSystemWatcher simply missing events such as rapidly adding files to a folder then deleting them

3) etc..

Now some ideas on how I would tackle this again:

1) Keep a small SQLite db for files names/path locally 2) Copy files to a tmp directory before sending to S3. 3) On file changes/updates/deletions/etc store that meta information in SQLite

Anyway just some ideas.

Sync services like Dropbox, theory behind file indexing?

Answers (1)

Related Questions