Reputation: 887
I have 3 AWS P instances processing some heavy stuff and saving results to relevant /home/user/folder
Also I have a main server with the same folder where I want to collect results from those 3 instances
Each instance works on its own part of the whole task, their results in sub folders not overlapping
Instances are 2 TB each, so I would like to get results from each instance as soon as they appear
This way when its job is done, I won't spend half a day copying results to the main server
I think one way of solving this is running something like this on each instance:
*/30 * * * * rsync /home/user/folder [email protected]:/home/user/folder
Are there any other more smart ways of achieving same results given that all of instances are AWS?
I also thought about (1) detachable storage and (2) storing on S3 but being new to AWS I might overlook some hidden pitfalls in such workflows, especially when it comes to terabytes of data and expensive instances.
How do you collect processed data from remote instances?
Upvotes: 2
Views: 221
Reputation: 2605
I would consider using rclone tool, which can be easy configured for the shared S3 bucket. Just be aware about copy/sync mode. It can rich up to several Gigabit throughput depending on your instance type.
Link for the project: rclone.org
Upvotes: 1
Reputation: 4014
My thoughts on some of the options mentioned in OP and comments, as well as some other ones I thought of:
Upvotes: 1