Aaron Digulla
Aaron Digulla

Reputation: 328840

Compare two folders and copy/link unique entries to a new folder

How can I copy all unique files from two source folders to a new destination folder?

As a set operation: How can I compute the difference between two folders?

Upvotes: 2

Views: 1830

Answers (4)

Aaron Digulla
Aaron Digulla

Reputation: 328840

At first, I thought I could solve this with clever usage of rsync but nothing really worked.

So my final solution was a small Python script (gist).

Upvotes: 0

Eran Ben-Natan
Eran Ben-Natan

Reputation: 2615

You can try this:

cd <First Dir>
find . > /tmp/first.dat
cd <Second Dir>
find . > /tmp/second.dat
comm -23 /tmp/first.dat /tmp/second.dat | while read line; do cp <First Dir>/$line <New Dir> ; done
comm -13 /tmp/first.dat /tmp/second.dat | while read line; do cp <SecondDir>/$line <New Dir> ; done

Upvotes: 3

ruakh
ruakh

Reputation: 183602

To copy all files from foo/ and bar/ to baz/, the simplest way is just to copy both, and let one overwrite the other:

cp --recursive foo/ baz/
cp --recursive bar/ baz/

If you want to be a bit cleaner, and not copy from bar/ anything that exists in foo/, you could write:

cp --recursive foo/ baz/
( cd bar/
  find -exec bash -c ' if ! [[ -e ../foo/"{}" ]] ; then
                         cp "{}" ../baz/"{}"
                       fi
                     ' \;
)

You can use the same approach to generate a list of files in bar/ that don't exist in foo/:

( cd bar/
  find -exec bash -c ' if ! [[ -e ../foo/"{}" ]] ; then
                         echo bar/"{}"
                       fi
                     ' \;
)

(or you could change echo bar/"{}" to printf %s\0 bar/"{}" to use a zero-valued byte, rather than a newline, as a separator).

Alternatively, for some variety, you could write:

diff --old-line-format=%L --new-line-format= --unchanged-line-format= \
     <( cd foo/ ; find | sort ) <( cd bar/ ; find | sort )

which passes the outputs of cd foo/ ; find | sort and cd bar/ ; find | sort to diff as the input-files, and tells diff to print the lines that are found only in the first input-file and discard everything else. (Note: this will break if any filenames contain newlines.)

None of the above compares the contents of the different files, simply because I'm not sure what should be done if they're different. Examining file-contents could use diff -r -q foo/ bar/ as a starting-point, but what do we do with that?

Upvotes: 1

Attila
Attila

Reputation: 28802

I'm sure there are other ways (without the extra file operations suggested in here), but here is a relatively easy way to accomplish this.

Assumptions:
A1) Only interested in the direct contents of the folder.
A2) Files with the same name are assumed to have identical content.

1) create/use an empty temporary directory (tmp)
2) copy the contents of sourceDir1 to tmp
3) delete the contents of sourceDir2 from tmp
-- Now you have the unique files of sourceDir1 in tmp
4) move the contents of tmp into the desired location
5) repeat steps 2)-4) with the roles of sourceDir1 and sourceDir2 swapped

Notes:
N1) You can use ls to list files (or directories), and redirect it to a file (say s1.tmp). Then you can compare the list of files (directories) of the other folder by using grep to see if the current file (directory) is listed in s1.tmp. You can use this technique to calculate what directories to enter for recursive processing (thus relaxing A1)).
N2) If the files in question are text files, you can use diff to see if they are identical. If yes, proceed as before, otherwise process the case of identical file name, different content appropriately (e.g. copy both files to the destination directory using unique extentions to indicate their source -- the logic here depends on your goal).
N3) You can compare binary files as well apparently, see stackoverflow#4013223 and superuser#135911

Upvotes: 1

Related Questions