Reputation: 645
As the title says, what I would like to accomplish is given a package(usually the size may vary between 500Mb and 1Gb), I would like to copy over something around 40 servers at the same time(concurrently), I've been using a script that run a copy at the time, therefore I'm considering these possibilities:
1- Multiprocess library and create a single process for each copy function so that, they can run concurrently; -although I think I might end up having an I/O bottleneck, and process cannot share the same data.
2-I m not using a single internet connection, but a huge corporate WAN.
Can anyone tell me whether is there any other more effective way(faster) to achieve the same thing? Or some other way to solve it?(I can run this task from a 2 core workstation).
Upvotes: 1
Views: 1411
Reputation: 24892
Assume your machines have 1Gbit connections. You'll get 800Mbit/s if you're lucky/work at it, and it'll take ~10s to copy each 1GByte and 6-7 minutes to update those machines. If that's good enough, the only thing you need to do is work on using the 1Gbit efficiently to hit that target (what are you seeing from your current scripts ? OK 1Gbit may be ambitous on WAN, but you can do a similar analysis). Multiprocessing might or might not help here... but it's not going to magically get you more bandwidth.
If it's not good enough, I'd either consider:
go P2P (see miku;s answer), so as soon as one machine has a bit of the data it can share it with other machines using it's own bandwidth. How much this helps depends to some extent on your network topology (existence of other bottleneck points).
Look into multicast, if the network is enough under your control that you can get the stuff routed appropriately (this seems pretty unlikely for WAN, but maybe one day in an IPv6 wonderland...). Instead of copying the same data 40 times (assuming it is the same each time), you just broadcast it once and all the receivers pick it up simultaneously. Multicast UDP isn't reliable (intended more for IPTV I think) but there have been attempts to build reliable file transfer tools using multicast tech e.g OpenPGM and MS's own implementation.
Upvotes: 0
Reputation: 188054
1) I have no experience with this, but it looks like a fit for your use case:
sendfile(2) is a system call which provides a "zero-copy" way of copying data from one file descriptor to another (a socket). The phrase "zero-copy" refers to the fact that all of the copying of data between the two descriptors is done entirely by the kernel, with no copying of data into userspace buffers. This is particularly useful when sending a file over a socket (e.g. FTP).
and
When do you want to use it? Basically any application sending files over the network can take advantage of sendfile(2).
2) Another option would be to use some torrent library. I recently learned (skip to 31:00 for the torrent stuff) that facebook distribute their daily software updates via torrent (and update 1000s of servers with 1.5GB binaries within 15min or so).
Upvotes: 1