Sandra Schlichting
Sandra Schlichting

Reputation: 26046

How to find files that change?

I would like to find the file names of the files that change in the test case at the bottom of this post.

It outputs

before
d41d8cd98f00b204e9800998ecf8427e  FFF/c.txt
d41d8cd98f00b204e9800998ecf8427e  FFF/a.txt
d41d8cd98f00b204e9800998ecf8427e  FFF/b.txt

after
d41d8cd98f00b204e9800998ecf8427e  FFF/c.txt
d41d8cd98f00b204e9800998ecf8427e  FFF/d.txt
d8e8fca2dc0f896fd7cb4cb0031ba249  FFF/b.txt

Question

How do I get the file names of the files that have changed?

In this case a.txt have been deleted, d.txt have been added, and b.txt have changed md5sum.

#!/bin/bash

mkdir -p FFF
touch FFF/a.txt
rm -f FFF/b.txt
touch FFF/b.txt
touch FFF/c.txt
rm -f FFF/d.txt

echo "before"    
find FFF -name "*.txt" -exec md5sum '{}' \;
echo ""

# makes some changes that I want to catch
rm -f FFF/a.txt
echo "test" > FFF/b.txt
touch FFF/d.txt

echo "after"
find FFF -name "*.txt" -exec md5sum '{}' \;

Upvotes: 3

Views: 588

Answers (5)

David Andersson
David Andersson

Reputation: 765

Another alternative is to use a file system watcher such as inotify, dnotify, fam, or gamin. Examples:

inotifywait -m /home/david

dnotify -all -r /home/david

Add options to perform certain commands or pipe their output to a read/process loop.

Upvotes: 0

David W.
David W.

Reputation: 107090

Okay, what's your setup?

  • Are you comparing two directories and need to know the files that have changed per directory? If so, diff -R will show you what was added, deleted, and modified in the directories involved. You may have to use diffdir or dirdiff on Solaris
  • You looking for files modified after a particular date? You can use find $dir -mtime. This will show you files found where the timestamp is newer (or older) than -mtime.

For example:

$ find $dir -mtime +3

Will find files older than three days old while:

$ find $dir -mtime -3

will find files younger than three days old. Some systems also have -mmin for checking for minutes.

If you're looking for a changes that have taken place in some random snapshot of time, then I suggest you look into using a version control system. A good version control system will give you the flexibility you want without having to reinvent the wheel. A single command (like svn log -rPREV:HEAD -v) can give you everything you need.

The two most popular version control systems are Subversion and Git. I find Subversion to be easier to use and setup, but Git is better if you have to share your code with others and don't have a central server. Baazar has a nice interface and is also fairly simple. I'm just starting to play with it.

Upvotes: 2

Shawn Chin
Shawn Chin

Reputation: 86974

If you store the output of both find commands into temp files, you can run diff on them to figure out the files that has changed. A sample output would be:

[me@home]$ diff -u ori.temp new.temp | tail -n+4 | grep "^[-+]" | sort -k2

-d41d8cd98f00b204e9800998ecf8427e  FFF/a.txt
-d41d8cd98f00b204e9800998ecf8427e  FFF/b.txt
+d41d8cd98f00b204e9800998ecf8427e  FFF/d.txt
+d8e8fca2dc0f896fd7cb4cb0031ba249  FFF/b.txt

You should be able to parse that output to determine the changed files. The 2nd column gives you the file names. Lines that start with - are deletions (unless a corresponding + exists, which means it's an edit) while Lines that start with + are additions.

The tailing sort -k2 sorts the output by the 2nd column making it easier to locate edits (duplicate appearance of file).


Parsing the output of diff can be done quite easily with a handful of awk or even pure bash. Unfortunately, my bash/awk-fu is not up to par, so here's my take on your script which uses a smattering of Python.

#!/bin/bash
# set up initial state
mkdir -p FFF && touch FFF/a.txt && rm -f FFF/b.txt 
touch FFF/b.txt FFF/c.txt && rm -f FFF/d.txt

# capture current state
TMP_ORI="$RANDOM.ori.tmp"
find FFF -name "*.txt" -exec md5sum '{}' \; > $TMP_ORI

# makes some changes that I want to catch
rm -f FFF/a.txt && echo "test" > FFF/b.txt && touch FFF/d.txt

# capture new state
TMP_NEW="$RANDOM.new.tmp"
find FFF -name "*.txt" -exec md5sum '{}' \; > $TMP_NEW

# run diff and parse output
diff -u $TMP_ORI $TMP_NEW | tail -n+4 | grep "^[-+]" | python -c '
import fileinput
modes = {"+" : "added", "-" : "removed" }
visited = {}
for line in fileinput.input():   # for each line from stdin
    checksum, file = line.split()   # split the columns
    if file in visited:
        visited[file] = "modified"  # file appeared before
    else:
        visited[file] = modes[checksum[0]]  # map "+/-" to "added/removed"

for file, mode in visited.iteritems():  # print results
    print "%s\t%s" % (file, mode)
'

rm $TMP_ORI $TMP_NEW # delete temp files

Running this script will give the following output:

[me@home] ./sandras_script.sh
FFF/d.txt       added
FFF/a.txt       removed
FFF/b.txt       modified

Upvotes: 2

Bryan Oakley
Bryan Oakley

Reputation: 386362

There are several options to find that will find files that have changed since a given point in time. For example, you could touch a temporary file at the start of the script, then run find -newer tmpfile to find all files that have been modified since you touched that temporary file.

Upvotes: 4

Mark Longair
Mark Longair

Reputation: 468191

Identifying files that have changed between particular states by their hashes (and presence in the directory structure) is essentially what the version control system git does anyway, so why not just use that? Here's a slight modification of your script, which adds the following steps:

  1. A first step to initialize the current directory as a git repository.
  2. After the first set of files are created, it creates a commit from the current state of the directory.
  3. After the subsequent set of modifications, it creates a second commit to record the modified state of the directory.
  4. Finally, uses git diff to show the changes between those two commits.

The modified script looks like:

#!/bin/bash

# Initialize the current directory as a git repository:
git init

mkdir -p FFF
touch FFF/a.txt
rm -f FFF/b.txt
touch FFF/b.txt
touch FFF/c.txt
rm -f FFF/d.txt

echo "before"    
find FFF -name "*.txt" -exec md5sum '{}' \;
echo ""

# Record the state of the directory as a new commit:
git add -A .
git commit -m "Initial state"

# makes some changes that I want to catch
rm -f FFF/a.txt
echo "test" > FFF/b.txt
touch FFF/d.txt

echo "after"
find FFF -name "*.txt" -exec md5sum '{}' \;

# Record the modified state of the directory as a second commit:
git add -A .
git commit -m "New state"

# Output the difference between those two commits:
git diff --name-only HEAD^ HEAD

The output from that script is then:

Initialized empty Git repository in /home/mark/tmp/foobar/.git/
before
d41d8cd98f00b204e9800998ecf8427e  FFF/b.txt
d41d8cd98f00b204e9800998ecf8427e  FFF/c.txt
d41d8cd98f00b204e9800998ecf8427e  FFF/a.txt

[master (root-commit) 8a6d1d9] Initial state
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 FFF/a.txt
 create mode 100644 FFF/b.txt
 create mode 100644 FFF/c.txt
after
d41d8cd98f00b204e9800998ecf8427e  FFF/d.txt
d8e8fca2dc0f896fd7cb4cb0031ba249  FFF/b.txt
d41d8cd98f00b204e9800998ecf8427e  FFF/c.txt
[master 810b0f5] New state
 2 files changed, 1 insertions(+), 0 deletions(-)
 rename FFF/{a.txt => d.txt} (100%)
FFF/a.txt
FFF/b.txt
FFF/d.txt

The last 3 lines are the output from the git diff command.

Upvotes: 2

Related Questions