methuselah
methuselah

Reputation: 13206

Comparing two directories to produce output

I am writing a Bash script that will replace files in folder A (source) with folder B (target). But before this happens, I want to record 2 files.

How do I accomplish this in Bash? I've tried using diff -qr but it yields the following output:

Files old/VERSION and new/VERSION differ
Files old/conf/mime.conf and new/conf/mime.conf differ
Only in new/data/pages: playground
Files old/doku.php and new/doku.php differ
Files old/inc/auth.php and new/inc/auth.php differ
Files old/inc/lang/no/lang.php and new/inc/lang/no/lang.php differ
Files old/lib/plugins/acl/remote.php and new/lib/plugins/acl/remote.php differ
Files old/lib/plugins/authplain/auth.php and new/lib/plugins/authplain/auth.php differ
Files old/lib/plugins/usermanager/admin.php and new/lib/plugins/usermanager/admin.php differ

I've also tried this

(rsync -rcn --out-format="%n" old/ new/ && rsync -rcn --out-format="%n" new/ old/) | sort | uniq

but it doesn't give me the scope of results I require. The struggle here is that the data isn't in the correct format, I just want files not directories to show in the text files e.g:

conf/mime.conf
data/pages/playground/
data/pages/playground/playground.txt
doku.php
inc/auth.php
inc/lang/no/lang.php
lib/plugins/acl/remote.php
lib/plugins/authplain/auth.php
lib/plugins/usermanager/admin.php

Upvotes: 0

Views: 86

Answers (1)

Adam Katz
Adam Katz

Reputation: 16128

List of files in directory B (new/) that are newer than directory A (old/):

find new -newermm old

This merely runs find and examines the content of new/ as filtered by -newerXY reference with X and Y both set to m (modification time) and reference being the old directory itself.

Files that are missing in directory B (new/) but are present in directory A (old/):

A=old B=new
diff -u <(find "$B" |sed "s:$B::") <(find "$A" |sed "s:$A::") \
  |sed "/^+\//!d; s::$A/:"

This sets variables $A and $B to your target directories, then runs a unified diff on their contents (using process substitution to locate with find and remove the directory name with sed so diff isn't confused). The final sed command first matches for the additions (lines starting with a +/), modifies them to replace that +/ with the directory name and a slash, and prints them (other lines are removed).

Here is a bash script that will create the file:

#!/bin/bash
# Usage: bash script.bash OLD_DIR NEW_DIR [OUTPUT_FILE]
# compare given directories

if [ -n "$3" ]; then # the optional 3rd argument is the output file
  OUTPUT="$3"
else # if it isn't provided, escape path slashes to underscores
  OUTPUT="${2////_}-newer-than-${1////_}"
fi

{
  find "$2" -newermm "$1"
  diff -u <(find "$2" |sed "s:$2::") <(find "$1" |sed "s:$1::") \
    |sed "/^+\//!d; s::$1/:"
} |sort > "$OUTPUT"

First, this determines the output file, which either comes from the third argument or else is created from the other inputs using a replacement to convert slashes to underscores in case there are paths, so for example, running as bash script.bash /usr/local/bin /usr/bin would output its file list to _usr_local_bin-newer-than-_usr_bin in the current working directory.

This combines the two commands and then ensures they are sorted. There won't be any duplicates, so you don't need to worry about that (if there were, you'd use sort -u).

You can get your first and second files by changing the order of arguments as you invoke this script.

Upvotes: 2

Related Questions