Reputation: 6246
I saw the following interesting usage of tar in a co-worker's Bash scripts:
`tar cf - * | (cd <dest> ; tar xf - )`
Apparently it works much like rsync -av does, but faster. The question arises, how?
-m
EDIT: Can anyone explain why should this solution be preferable over the following?
cp -rfp * dest
Is the former faster?
Upvotes: 13
Views: 4970
Reputation: 31
The PowerTools book has the copy as:
tar cf - * | (cd <dest> && tar xvBf - )
The '&&' is a conditional that checks the return code of the preceding command. Ihat is, if the "cd " failed, the "tar xf -" would not be executed. I always throw in a -v (verbose) and a -B (reblock input).
I use tar all the time. It is especially useful for copying to a remote system, such as:
tar cvf - . | ssh someone@somemachine '(cd somewhere && tar xBf -)'
Upvotes: 3
Reputation: 5129
I believe the tar will do a Windows style 'merge' operation with deeply nested directories, whereas the cp will overwrite sub-directories.
For example if you have the layout:
dir/subdir/file1
and you copy it to a destination that contains:
dir/subdir/file2
Then with copy you will be left with:
dir/subdir/file1
But with the tar command, your destination will contain:
dir/subdir/file1
dir/subdir/file2
Upvotes: 1
Reputation:
$ time { tar -cf - * | (cd ../bar; tar -xf - ); } real 0m4.209s user 0m0.724s sys 0m3.380s $ time { cp * ../baz/; } real 0m18.727s user 0m0.644s sys 0m7.127s
$ time { tar -cf - * | (cd ../bar; tar -xf - ); } real 3m44.007s user 0m3.390s sys 0m25.644s $ time { cp * ../baz/; } real 3m11.197s user 0m0.023s sys 0m9.576s
My guess is this phenomenon is highly filesystem-dependent. If I'm right you will see a drastic difference between a filesystem that specializes in numerous small files, such as reiserfs 3.6, and a filesystem that is better at handling large files.
(I ran the above tests on HFS+.)
Upvotes: 9
Reputation: 21525
As it happens, a co-worker wrote a nearly identical command into one of our scripts. After I spent some time puzzling over it, I asked why he had used that rather than cp
. His answer, as I recall it, was that cp
is slow when making a copy from one file system to another.
Whether or not this is true would require more testing than I care to spend on the question, but it makes a certain amount of sense. The first tar
process reads from the source device as quickly as possible only waiting for that device to read. Meanwhile, the second tar
process reads from its input pipe and writes as quickly as possible. It might have to wait for input, but if writes on the destination device are slower than reads on the source device it will only wait on the destination device. A single cp
command will have to wait on both the source and the destination devices.
On the other hand, modern operating systems do a pretty good job of pre-caching IO operations. It's entirely possible cp
will spend most of its time waiting on writes and getting reads from memory rather than the device itself. It seems like one would need really solid data to chose using two tar
commands rather than the more straightforward cp
command.
Upvotes: 0
Reputation: 6163
If you have GNU cp
(which all Linux-based systems will), the cp --archive
will work, even on hard-linked files, and tar is not needed.
Upvotes: 0
Reputation: 20634
The tar solution will preserve symbolic links, whereas cp will just make copies and destroy the links.
tar has been a standard Unix utility a lot longer than rsync. You're more likely to find it in a situation when a directory hierarchy needs to be copied to another location (even another computer). rsync is probably easier to use these days, but is slower because it compares both the source and destinations and sync's them. tar just copies in one direction.
Upvotes: 0
Reputation: 4523
On the difference between cp and tar to copy the directory hierarchies, a simple experiment can be conducted to show the difference:
alastair box:~/hack/cptest [1134]% mkdir src
alastair box:~/hack/cptest [1135]% cd src
alastair box:~/hack/cptest/src [1136]% touch foo
alastair box:~/hack/cptest/src [1137]% ln -s foo foo-s
alastair box:~/hack/cptest/src [1138]% ln foo foo-h
alastair box:~/hack/cptest/src [1139]% ls -a
total 0
-rw-r--r-- 2 alastair alastair 0 Nov 25 14:59 foo
-rw-r--r-- 2 alastair alastair 0 Nov 25 14:59 foo-h
lrwxrwxrwx 1 alastair alastair 3 Nov 25 14:59 foo-s -> foo
alastair box:~/hack/cptest/src [1142]% mkdir ../cpdest
alastair box:~/hack/cptest/src [1143]% cp -rfp * ../cpdest
alastair box:~/hack/cptest/src [1144]% mkdir ../tardest
alastair box:~/hack/cptest/src [1145]% tar cf - * | (cd ../tardest ; tar xf - )
alastair box:~/hack/cptest/src [1146]% cd ..
alastair box:~/hack/cptest [1147]% ls -l cpdest
total 0
-rw-r--r-- 1 alastair alastair 0 Nov 25 14:59 foo
-rw-r--r-- 1 alastair alastair 0 Nov 25 14:59 foo-h
lrwxrwxrwx 1 alastair alastair 3 Nov 25 15:00 foo-s -> foo
alastair box:~/hack/cptest [1148]% ls -l tardest
total 0
-rw-r--r-- 2 alastair alastair 0 Nov 25 14:59 foo
-rw-r--r-- 2 alastair alastair 0 Nov 25 14:59 foo-h
lrwxrwxrwx 1 alastair alastair 3 Nov 25 15:00 foo-s -> foo
The difference is in the hard-linked files. Notice how the hard-linked files are copied individually with cp
and together with tar
. To make the difference more obvious, have a look at the inodes for each:
alastair box:~/hack/cptest [1149]% ls -i cpdest
24690722 foo 24690723 foo-h 24690724 foo-s
alastair box:~/hack/cptest [1150]% ls -i tardest
24690801 foo 24690801 foo-h 24690802 foo-s
There are probably other reasons to prefer tar, but this is one big one, at least if you have extensively hard-linked files.
Upvotes: 10
Reputation: 20174
Some old versions of cp didn't have -f / -p (and similar) options for preserving permissions, so this tar trick did the job.
Upvotes: 1
Reputation: 7732
tar cf - *
This uses tar to send * to stdout
|
This does the obvious redirect of stdout to...
(cd <dest> ; tar xf - )
This, which changes PWD to the appropriate location and then extracts from stdin
I do not know why this would be faster than rsync, as there is no compression involved.
Upvotes: 0
Reputation: 507413
tar cf - * | (cd <dest> ; tar xf - )
is going to tar all not hidden files/directories of the current directory to stdout, then piping that into a new subshells' stdin. That shell first changes the current working directory to <dest>
, and then untars it to that directory.
Upvotes: 1
Reputation: 23959
This is a unique usage of pipes. Basically, the first tar typically writes directly to a file, but instead it's going to write to stdout (the -), which is then redirected to the other tar which takes stdin rather than a file. Basically this is the same thing as tarring to a file and untarring later, except without the file in between.
Upvotes: 2
Reputation: 532755
It writes the archive to standard output, then pipes it to a subprocess -- wrapped by the parentheses -- that changes to a different directory and reads/extracts from standard input. That's what the dash character after the f
argument means. It's basically copying all the visible files and subdirectories of the current directory to another directory.
Upvotes: 13