Reputation: 393
I need to rsync files from a bucket to a local machine everyday, and the bucket contains 20k files. I need to download only the changed files that end with *some_naming_convention.csv . What's the best way to do that? using a wildcard in the download source gave me an error.
Upvotes: 4
Views: 4761
Reputation: 187
From here, you can do something like gsutil rsync -r -x '^(?!.*\.json$).*' gs://mybucket mydir
to rsync all json files. The key is the ?!
prefix to the pattern you actually want.
The -x
flag excludes a pattern. The pattern ^(?!.*\.json$).*
uses negative look-ahead to specify patterns not ending in .json
. It follows that the result of the gsutil rsync
call will get all files which end in .json
.
Upvotes: 2
Reputation: 2680
I don't think you can do that with Rsynch. As Christopher told you, you can skip files by using the "-x" flag, but no just synch those [1]. I created a public Feature Request on your behalf [2] for you to follow updates there.
As I say in the FR, IMHO I consider this to not follow the purpose of rsynch, as it's to keep folders/buckets synchronise, and just synchronising some of them don't fall in that purpose.
There is a possible "workaround" by using gsutil cp
to copy files and -n
to skip the ones that already exist. The whole command for your case should be:
gsutil -m cp -n <bucket>/*some_naming_convention.csv <directory>
Other option, maybe a little bit more far-fetched is to copy/move those files to a folder and then use that folder to rsynch.
I hope this works for you ;)
Upvotes: 5
Reputation: 25599
Rsync lets you include and exclude files matching patterns.
For each file rsync applies the first patch that matches, some if you want to sync only selected files then you need to include those, and then exclude everything else.
Add the following to your rsync options:
--include='*some_naming_convention.csv' --exclude='*'
That's enough if all your files are in one directory. If you also want to search sub folders then you need a little bit more:
--include='*/' --include='*some_naming_convention.csv' --exclude='*'
This will duplicate all the directory tree, but only copy the files you want. If that leaves empty directories you don't want then add --prune-empty-dirs
.
Upvotes: -3