Jaearess
Jaearess

Reputation: 675

What's the best way to keep multiple Linux servers synced?

I have several different locations in a fairly wide area, each with a Linux server storing company data. This data changes every day in different ways at each different location. I need a way to keep this data up-to-date and synced between all these locations.

For example:

In one location someone places a set of images on their local server. In another location, someone else places a group of documents on their local server. A third location adds a handful of both images and documents to their server. In two other locations, no changes are made to their local servers at all. By the next morning, I need the servers at all five locations to have all those images and documents.

My first instinct is to use rsync and a cron job to do the syncing over night (1 a.m. to 6 a.m. or so), when none of the bandwidth at our locations is being used. It seems to me that it would work best to have one server be the "central" server, pulling in all the files from the other servers first. Then it would push those changes back out to each remote server? Or is there another, better way to perform this function?

Upvotes: 2

Views: 7376

Answers (8)

Jeremy Cantrell
Jeremy Cantrell

Reputation: 27416

The way I do it (on Debian/Ubuntu boxes):

  • Use dpkg --get-selections to get your installed packages
  • Use dpkg --set-selections to install those packages from the list created
  • Use a source control solution to manage the configuration files. I use git in a centralized fashion, but subversion could be used just as easily.

Upvotes: 3

Abhinav
Abhinav

Reputation: 1

Depends upon following * How many servers/computers that need to be synced ? ** If there are too many servers using rsync becomes a problem ** Either you use threads and sync to multiple servers at same time or one after the other. So you are looking at high load on source machine or in-consistent data on servers( in a cluster ) at given point of time in the latter case

  • Size of the folders that needs to be synced and how often it changes

    • If the data is huge then rsync will take time.
  • Number of files

    • If number of files are large and specially if they are small files rsync will again take a lot of time

So all depends on the scenario whether to use rsync , NFS , Version control

  • If there are less servers and just small amount of data , then it makes sense to run rysnc every hour. You can also package content into RPM if data changes occasionally

With the information provided , IMO Version Control will suit you the best .

Rsync/scp might give problems if two people upload different files with same name . NFS over multiple locations needs to be architect-ed with perfection

Why not have a single/multiple repositories and every one just commits to those repository . All you need to do is keep the repository in sync. If the data is huge and updates are frequent then your repository server will need good amount of RAM and good I/O subsystem

Upvotes: 0

Joe Skora
Joe Skora

Reputation: 14920

AFAIK, rsync is your best choice, it supports partial file updates among a variety of other features. Once setup it is very reliable. You can even setup the cron with timestamped log files to track what is updated in each run.

Upvotes: 2

Aaron H.
Aaron H.

Reputation: 6587

I have to agree with Matt McMinn, especially since it's company data, I'd use source control, and depending on the rate of change, run it more often.

I think the central clearinghouse is a good idea.

Upvotes: 0

Dan Udey
Dan Udey

Reputation: 2977

One thing you could (theoretically) do is create a script using Python or something and the inotify kernel feature (through the pyinotify package, for example).

You can run the script, which registers to receive events on certain trees. Your script could then watch directories, and then update all the other servers as things change on each one.

For example, if someone uploads spreadsheet.doc to the server, the script sees it instantly; if the document doesn't get modified or deleted within, say, 5 minutes, the script could copy it to the other servers (e.g. through rsync)

A system like this could theoretically implement a sort of limited 'filesystem replication' from one machine to another. Kind of a neat idea, but you'd probably have to code it yourself.

Upvotes: 2

PiedPiper
PiedPiper

Reputation: 5785

rsync would be your best choice. But you need to carefully consider how you are going to resolve conflicts between updates to the same data on different sites. If site-1 has updated 'customers.doc' and site-2 has a different update to the same file, how are you going to resolve it?

Upvotes: 0

Matt McMinn
Matt McMinn

Reputation: 16291

I don't know how practical this is, but a source control system might work here. At some point (perhaps each hour?) during the day, a cron job runs a commit, and overnight, each machine runs a checkout. You could run into issues with a long commit not being done when a checkout needs to run, and essentially the same thing could be done rsync.

I guess what I'm thinking is that a central server would make your sync operation easier - conflicts can be handled once on central, then pushed out to the other machines.

Upvotes: 1

Kyle Burton
Kyle Burton

Reputation: 27528

An alternative if rsync isn't the best solution for you is Unison. Unison works under Windows and it has some features for handling when there are changes on both sides (not necessarily needing to pick one server as the primary, as you've suggested).

Depending on how complex the task is, either may work.

Upvotes: 2

Related Questions