Reputation: 1844
Consider this scenario. In a load-balanced environment, I have 3 separate instances of a CMS running on 3 different physical servers. These 3 separate running instances of the application is sharing the same database.
On each server, the CMS has a /media folder where all media subfolders and files reside. My question is how I'd implement/code a file replication service/functionality in Golang, so when a subfolder or file is added/changed/deleted on one of the servers, it'll get copied/replicated/deleted on all other servers?
What packages would I need to look in to, or perhaps you have a small code snippet to help me get started? That would be awesome.
Edit: This question has been marked as "duplicate", but it is not. It is however an alternative to setting up a shared network file system. I'm thinking that keeping a copy of the same file on all servers, synchronizing and keeping them updated might be better than sharing them.
Upvotes: 0
Views: 1873
Reputation: 9458
You probably shouldn't do this. Use a distributed file system, object storage (ala S3 or GCS) or a syncing program like btsync or syncthing.
If you still want to do this yourself, it will be challenging. You are basically building a distributed database and they are difficult to get right.
At first blush you could checkout something like etcd or raft, but unfortunately etcd doesn't work well with large files.
You could, on upload, also copy the file to every other server using ssh. But then what happens when a server goes down? Or what happens when two people update the same file at the same time?
Maybe you could design it such that every file gets a unique id (perhaps based on the hash of its contents so you can safely dedupe) and those files can never be updated or deleted, only added. That would solve the simultaneous update problem, but you'd still have the downtime problem.
One approach would be for each server to maintain an append-only version log when a file is added:
VERSION | FILE HASH
1 | abcd123
2 | efgh456
3 | ijkl789
With that you can pull every file from a server and a single number would be sufficient to know when a file is added. (For example if you think Server A is on version 5, and you get informed it is now on version 7, you know you need to sync 2 files)
You could do this with a database table:
ID | LOCAL_SERVER_ID | REMOTE_SERVER_ID | VERSION | FILE HASH
Which you could periodically poll and do your syncing via ssh or http between machines. If a server was down you could just retry until it works.
Or if you didn't want to have a centralized database for this you could use a library like memberlist. The local meta data for each node could be its version.
Either way there will be some amount of delay between a file was uploaded to a single server, and when it's available on all of them. Handling that well is hard, which is why you probably shouldn't do this.
Upvotes: 3