john
john

Reputation: 11709

speed up file transfers from one machine to other machine

I have to copy around 25 files from one machine to other machine. Each file size is around 15 GB. I have 1GB link and both of these machines are very powerful box. They have around 40 CPUS. Now to copy all those files, it takes 50 minutes for me.

Below is my script which I am running on a box where I am supposed to copy files. It copies 15 files in /data01/test_primary folder and other 10 files it copies in /data02/test_secondary. Logic is very simple, I figure out which local machine to copy data from and if that local machine is down then I go to remote machine and copy the data.

export PRIMARY=/data01/test_primary
export SECONDARY=/data02/test_secondary
export dir3=/bat/data/snapshot/20180227
PRIMARY_FILES=(685 959 682 679 688 651 909 906 657 881 884 878 853 707 847)
SECONDARY_FILES=(950 883 887 890 1001 994 997 1058 981 833)

export LOCATION_1="machineA"
export LOCATION_2="machineB"
export LOCATION_3="machineC"

do_Copy() {
  el=$1
  PRIMSEC=$2
  scp golden@"$LOCATION_1":"$dir3"/proc_"$el"_5.data "$PRIMSEC"/. || scp golden@"$LOCATION_2":"$dir3"/proc_"$el"_5.data "$PRIMSEC"/. || scp golden@"$LOCATION_3":"$dir3"/proc_"$el"_5.data "$PRIMSEC"/. || exit 1
}
export -f do_Copy
parallel -j 5 do_Copy {} $PRIMARY ::: ${PRIMARY_FILES[@]} &
parallel -j 5 do_Copy {} $SECONDARY ::: ${SECONDARY_FILES[@]} &
wait

echo "All copied."

I believe the main problem with my script is that I might be opening a separate scp connection for each file, that adds a lot of needless overhead. Is there anything I can optimize here? So I am thinking there might be some improvement I can make here so that it can copy fast. As of now I am combining scp with gnu-parallel so that I can achieve parallelism.

What are the options I have to speed things up? I am ready to try out different things and see if it is helping me out or not.

Upvotes: 2

Views: 1229

Answers (2)

Ole Tange
Ole Tange

Reputation: 33748

I have to copy around 25 files from one machine to other machine.

rsync is good when you only copy differences. From your description it sounds as if the files are new files, and not just updates of existing files.

Do the new files look similar to existing files? In that case you could do:

receiver$ cp existing new
receiver$ rsync sender:new new

Upvotes: 0

Gonzalo Matheu
Gonzalo Matheu

Reputation: 10104

Enabling scp (-C flag) compression might speed things up (depending on data). Having in mind, that you have a lots of CPUs should not take long to compress.

Or, another alternative, if possible, would be to use rsync (-z enables compression) instead of scp. rsync adds a few optimizations to make the operation faster and also has a special delta transfer algorithm (in case of updatng files)

Upvotes: 1

Related Questions