Reputation: 125
I'm after a little help with some Bash scripting (on OSX). I want to create a script that takes two parameters - source folder and target folder - and checks all files in the source hierarchy to see whether or not they exist in the target hierarchy. i.e. Given a data DVD check whether the files contained on it are already on the internal drive.
What I've come up with so far is
#!/bin/bash
if [ $# -ne 2 ]
then
echo "Usage is command sourcedir targetdir"
exit 0
fi
source="$1"
target="$2"
for f in "$( find $source -type f -name '*' -print )"
do
I'm now not sure how it's best to obtain the filename without its path and then see if it exists. I am really a beginner at scripting.
Edit: The answers given so far are all very efficient in terms of compact code. However I need to be able to look for files found within the total source hierarchy anywhere within the target hierarchy. If found I would like to compare checksums and last modified dates etc and comment or, if not found, I would like to note this. The purpose is to check whether files on external media have been uploaded to a file server.
Upvotes: 1
Views: 431
Reputation: 107759
A few remarks about the line for f in "$( find $source -type f -name '*' -print )"
:
"$source"
. Always use double quotes around variable substitutions. Otherwise the result is split into words that are treated as wildcard patterns (a historical oddity in the shell parsing rules); in particular, this would fail if the value of the variable contain spaces.find
that way. Because of the double quotes, there would be a single iteration through the loop, with $f
containing the complete output from find
. Without double quotes, file names containing spaces and other special characters would trip the script.-name '*'
is a no-op, it matches everything.As far as I understand, you want to look for files by name independently of their location, i.e. you consider /dvd/path/to/somefile
to be a match to /internal-drive/different/path-to/somefile
. So make an list of files on each side indexed by name. You can do this by massaging the output of find
a little. The code below can cope with any character in file names except newlines.
list_files () {
find . -type f -print |
sed 's:^\(.*\)/\(.*\)$:\2/\1/\2:' |
sort
}
source_files="$(cd "$1" && list_files)"
dest_files="$(cd "$2" && list_files)"
join -t / -v 1 <(echo "$source_files") <(echo "$dest_files") |
sed 's:^[^/]*/::'
The list_files
function generates a list of file names with paths, and prepends the file name in front of the files, so e.g. /mnt/dvd/some/dir/filename.txt
will appear as filename.txt/./some/dir/filename.txt
. It then sorts the files.
The join
command prints out lines like filename.txt/./some/dir/filename.txt
when there is a file called filename.txt
in the source hierarchy but not in the destination hierarchy. We finally massage its output a little since we no longer need the filename at the beginning of the line.
Upvotes: 0
Reputation: 95252
To list only files in $source_dir
that do not exist in $target_dir
:
comm -23 <(cd "$source_dir" && find .|sort) <(cd "$target_dir" && find .|sort)
You can limit it to just regular files with -f
on the find
commands, etc.
The comm
command (short for "common") finds lines in common between two text files and outputs three columns: lines only in the first file, lines only in the second file, and lines common to both. The numbers suppress the corresponding column, so the output of comm -23
is only the lines from the first file that don't appear in the second.
The process substitution syntax <(command)
is replaced by the pathname to a named pipe connected to the output of the given command, which lets you use a "pipe" anywhere you could put a filename, instead of only stdin and stdout.
The commands in this case generate lists of files under the two directories - the cd
makes the output relative to the directories being compared, so that corresponding files come out as identical strings, and the sort
ensures that comm
won't be confused by the same files listed in different order in the two folders.
Upvotes: 1
Reputation: 30210
This should give you some ideas:
#!/bin/bash
DIR1="tmpa"
DIR2="tmpb"
function sorted_contents
{
cd "$1"
find . -type f | sort
}
DIR1_CONTENTS=$(sorted_contents "$DIR1")
DIR2_CONTENTS=$(sorted_contents "$DIR2")
diff -y <(echo "$DIR1_CONTENTS") <(echo "$DIR2_CONTENTS")
In my test directories, the output was:
[user@host so]$ ./dirdiff.sh ./address-book.dat ./address-book.dat ./passwords.txt ./passwords.txt ./some-song.mp3 < ./the-holy-grail.info ./the-holy-grail.info > ./victory.wav ./zzz.wad ./zzz.wad
If its not clear, "some-song.mp3" was only in the first directory while "victory.wav" was only in the second. The rest of the files were common.
Note that this only compares the file names, not the contents. If you like where this is headed, you could play with the diff
options (maybe --suppress-common-lines
if you want cleaner output).
But this is probably how I'd approach it -- offload a lot of the work onto diff
.
EDIT: I should also point out that something as simple as:
[user@host so]$ diff tmpa tmpb
would also work:
Only in tmpa: some-song.mp3 Only in tmpb: victory.wav
... but not feel as satisfying as writing a script yourself. :-)
Upvotes: 1