PaaPs
PaaPs

Reputation: 393

gsutil rsync only files matching a pattern

I need to rsync files from a bucket to a local machine everyday, and the bucket contains 20k files. I need to download only the changed files that end with *some_naming_convention.csv . What's the best way to do that? using a wildcard in the download source gave me an error.

Upvotes: 4

Views: 4761

Answers (3)

Logan
Logan

Reputation: 187

Original Answer

From here, you can do something like gsutil rsync -r -x '^(?!.*\.json$).*' gs://mybucket mydir to rsync all json files. The key is the ?! prefix to the pattern you actually want.

Edit

The -x flag excludes a pattern. The pattern ^(?!.*\.json$).* uses negative look-ahead to specify patterns not ending in .json. It follows that the result of the gsutil rsync call will get all files which end in .json.

Upvotes: 2

Iñigo
Iñigo

Reputation: 2680

I don't think you can do that with Rsynch. As Christopher told you, you can skip files by using the "-x" flag, but no just synch those [1]. I created a public Feature Request on your behalf [2] for you to follow updates there.

As I say in the FR, IMHO I consider this to not follow the purpose of rsynch, as it's to keep folders/buckets synchronise, and just synchronising some of them don't fall in that purpose.

There is a possible "workaround" by using gsutil cp to copy files and -n to skip the ones that already exist. The whole command for your case should be:

gsutil -m cp -n <bucket>/*some_naming_convention.csv <directory>

Other option, maybe a little bit more far-fetched is to copy/move those files to a folder and then use that folder to rsynch.

I hope this works for you ;)

Upvotes: 5

ams
ams

Reputation: 25599

Rsync lets you include and exclude files matching patterns.

For each file rsync applies the first patch that matches, some if you want to sync only selected files then you need to include those, and then exclude everything else.

Add the following to your rsync options:

--include='*some_naming_convention.csv' --exclude='*'

That's enough if all your files are in one directory. If you also want to search sub folders then you need a little bit more:

--include='*/' --include='*some_naming_convention.csv' --exclude='*'

This will duplicate all the directory tree, but only copy the files you want. If that leaves empty directories you don't want then add --prune-empty-dirs.

Upvotes: -3

Related Questions