s-m-e
s-m-e

Reputation: 3729

How to determine whether files have been changed in a directory tree without traversing the entire tree?

Imagine a directory tree (on Linux):

user@computer:~/demo> find .
.
./test1
./test1/test1_a
./test1/test1_a/somefile_1a
./test1/test1_b
./test1/test1_b/somefile_1b
./test0
./test0/test0_a
./test0/test0_a/somefile_0a
./test0/test0_b
./test0/test0_b/somefile_0b

Scenario: I determine all available meta info about every directory and file in that tree (mtime, ctime, inode, size, checksums on file contents ...), including the highest-level directory, demo. I store this information. Then, some file/s or directory/ies is/are changed (literally changed or newly created or deleted). Using the previously determined and stored information, I now want to figure out what has changed.

My solution so far: I traverse the entire tree, then look for changed meta information, then process it. Above a certain size, traversing a tree and looking at every directory and file becomes quite time consuming - even if you look at pure meta info only (i.e. ctime, mtime etc, NOT file content checksums). One can optimize such a traversal only to a certain degree (e.g. read meta info on files and folders actually only once during a traversal instead of multiple times etc) - at the end of the day I/O speed becomes the bottleneck.

Question: What options do I have (on Unix / Linux file systems) to look for changes in my tree without traversing all of it? I.e. is there any information stored for demo which tells me / indicates in some way that something below it (e.g. somefile_1b) has been changed? Are there any specific filesystems (EXT*, XFS, ZFS, ...) offering features of this kind?

Note: I am aware of the option of running a background process for monitoring changes to the filesystem. It would eliminate the need for a full traversal of my tree, though I am more interested in options which do NOT require a background monitoring process (if an option of this kind exists at all).

Upvotes: 1

Views: 997

Answers (1)

Andrew Henle
Andrew Henle

Reputation: 1

ZFS provides the capability via zfs diff ... Per the Oracle Solaris 11.2 documentation:

Identifying ZFS Snapshot Differences (zfs diff)

You can determine ZFS snapshot differences by using the zfs diff command.

For example, assume that the following two snapshots are created:

$ ls /tank/home/tim
fileA
$ zfs snapshot tank/home/tim@snap1
$ ls /tank/home/tim
fileA  fileB
$ zfs snapshot tank/home/tim@snap2

For example, to identify the differences between two snapshots, use syntax similar to the following:

$ zfs diff tank/home/tim@snap1 tank/home/tim@snap2
M       /tank/home/tim/
+       /tank/home/tim/fileB

In the output, the M indicates that the directory has been modified. The + indicates that fileB exists in the later snapshot.

The R in the following output indicates that a file in a snapshot has been renamed.

$ mv /tank/cindy/fileB /tank/cindy/fileC
$ zfs snapshot tank/cindy@snap2
$ zfs diff tank/cindy@snap1 tank/cindy@snap2
M       /tank/cindy/
R       /tank/cindy/fileB -> /tank/cindy/fileC

This does only compare between two snapshots, so you do have to have the ability to create ZFS snapshots to use this effectively.

Upvotes: 1

Related Questions