user5611823
user5611823

Reputation: 341

Using Rsync filter to include/exclude files

I'm trying to backup a filesystem, exclude /mnt but include a particular path within /mnt, it looks like using --filter is recommended over --include and --exclude, however I don't seem to be able to get it to do my bidding , example:

rsync -aA -H --numeric-ids -v --progress --delete \
  --filter="merge /tmp/mergefilter.txt" /  /mnt/data/mybackup/

My /tmp/mergefilter.txt says:

+ /mnt/data/i-want-to-rsyncthisdirectory/
- /dev
- /sys/
- /tmp/
- /run/
- /mnt/
- /proc/
- /media/
- /var/swap
- /lost+found/

All of the paths starting with "-" gets ignored, however my include for /mnt/data/i-want-to-rsyncthisdirectory/ seems to never get rsync'd. Order and/or including/excluding the trailing slash does not appear to change the behavior related to the path I want included.

EDIT: Note that I do want to backup /etc /usr /var etc. as per the source specified as /

Appreciate any guidance as the man page is a bit of a minefield...

Upvotes: 22

Views: 59302

Answers (3)

Tom
Tom

Reputation: 421

This question is quite old but I think this might help you:

(from rsync 3.1.2 manual)

Note that, when using the --recursive (-r) option (which is implied by -a), every subcomponent of every path is visited from the top down, so include/exclude patterns get applied recursively to each subcomponent's full name (e.g. to include "/foo/bar/baz" the subcomponents "/foo" and "/foo/bar" must not be excluded). The exclude patterns actually short-circuit the directory traversal stage when rsync finds the files to send. If a pattern excludes a particular parent directory, it can render a deeper include pattern ineffectual because rsync did not descend through that excluded section of the hierarchy. This is particularly important when using a trailing '*' rule. For instance, this won't work:

         + /some/path/this-file-will-not-be-found
         + /file-is-included
         - *

This fails because the parent directory "some" is excluded by the '*' rule, so rsync never visits any of the files in the "some" or "some/path" directories. One solution is to ask for all directories in the hierarchy to be included by using a single rule: "+ */" (put it somewhere before the "- *" rule), and perhaps use the --prune-empty-dirs option. Another solution is to add spe- cific include rules for all the parent dirs that need to be visited. For instance, this set of rules works fine:

         + /some/
         + /some/path/
         + /some/path/this-file-is-found
         + /file-also-included
         - *

I proposed something in my original answer that actually does not work (I tested it). I reproduce a tree similar to yours and this solution should work now:

+ /mnt/
+ /mnt/data/
+ /mnt/data/i-want-to-rsyncthisdirectory/
- /mnt/data/*
- /mnt/*
- /dev
- /sys/
- /tmp/
- /run/
- /proc/
- /media/
- /var/swap
- /lost+found/

Explanations:

(only rewording the manual in the end but as you said the manual is a bit cryptic)

Rules are read from top to bottom each time a file must be transferred by rsync. But in your case /mnt/data/i-want-to-rsyncthisdirectory/ is not backed up because you exclude /mnt and this short-circuits your include rules. So the solution is to include each folder and subfolder until the folder you want to back up and then to exclude what you do not want to back up subfolder by subfolder.

Note the * at the end of each subfolder exclusion. It will prevent rsync to back up the files and folder located in these subfolders which is what you want I think.

Simpler solution: (edit 2)

You can even simplify this with the *** pattern that was added in version 2.6.7:

+ /mnt/
+ /mnt/data/
+ /mnt/data/i-want-to-rsyncthisdirectory/***
- /mnt/**

This operator allows you to use the ** wildcard for exclusion and consequently to have only one exclude line.

I also discovered that you can understand which filter rules exclude/include each file or folder thanks to the following rsync arguments:

--verbose --verbose

Combined with the --dry-run argument you should be able to debug you problem :)

Upvotes: 32

shearn89
shearn89

Reputation: 897

In case someone else is battling with this as I am, I have managed to get the following to work. In my case I'm selectively sync repositories from another server.

Place filters in a file:

+ epel/
+ epel/7/
+ epel/7/x86_64/
+ epel/7/x86_64/Packages**
+ epel/7/x86_64/repodata**
- **

And can then sync everything as intended with:

cd /srv/repo
rsync -rvzP -f 'merge /home/user/sync-filter.txt' ./ user@remote:/srv/repo/

Initially, I had my filter file set up with epel/7/x86_64/Packages/**, which failed to work because of the trailing slash before the **. Removing the / made it all spring in to life as intended!

Upvotes: 4

Marcus
Marcus

Reputation: 932

For me, this command is doing the job:

rsync -aA -H --numeric-ids -v --progress --delete \
--filter="+ /mnt/data/i-want-to-rsyncthisdirectory/" \
--filter="- *" . /mnt/data/mybackup/

Basically, I used a + filter for the directory in question and exlcude all the others (as you do in your given example).

There is no need to explicitly negate all the directories you do not want to sync. Instead, you can ignore all except the one in question.

Upvotes: 12

Related Questions