Igniter
Igniter

Reputation: 887

Syncing remote folders from several machines to one AWS instance

I have 3 AWS P instances processing some heavy stuff and saving results to relevant /home/user/folder
Also I have a main server with the same folder where I want to collect results from those 3 instances
Each instance works on its own part of the whole task, their results in sub folders not overlapping

Instances are 2 TB each, so I would like to get results from each instance as soon as they appear
This way when its job is done, I won't spend half a day copying results to the main server

I think one way of solving this is running something like this on each instance:

*/30 * * * * rsync /home/user/folder [email protected]:/home/user/folder

Are there any other more smart ways of achieving same results given that all of instances are AWS?
I also thought about (1) detachable storage and (2) storing on S3 but being new to AWS I might overlook some hidden pitfalls in such workflows, especially when it comes to terabytes of data and expensive instances.

How do you collect processed data from remote instances?

Upvotes: 2

Views: 221

Answers (2)

Roman Shishkin
Roman Shishkin

Reputation: 2605

I would consider using rclone tool, which can be easy configured for the shared S3 bucket. Just be aware about copy/sync mode. It can rich up to several Gigabit throughput depending on your instance type.

Link for the project: rclone.org

Upvotes: 1

jingx
jingx

Reputation: 4014

My thoughts on some of the options mentioned in OP and comments, as well as some other ones I thought of:

  1. EFS: create an EFS and mount it as an NFS drive on all the instances. It's the easiest but probably costs the most.
  2. s3fs: have all the instances mount the same S3 bucket using s3fs. This is likely the most inexpensive solution. You also don't need to worry about running out of disk space. The downside is that the performance is not going to be that good compared to mounted NFS drives.
  3. EBS volumes: attach an EBS volume to each worker instance for them to write the results to. When they are done, detach the volumes and attach them to the main server. This will be the fastest and still cheaper than EFS. If you can't or won't do all the detaching/attaching manually you'll need to write some scripts.
  4. Old school NFS shares: there is nothing wrong with a plain vanilla NFS setup without any of those fancy AWS acronyms. :-)

Upvotes: 1

Related Questions