Eric Uldall
Eric Uldall

Reputation: 2391

Google Storage Rsync Exclusion

I'm trying to use the gcloud rsync utility to sync only files that match a certain date string, ex: 14-Sep-2015

The file structure is as follows: bucket/123/CODE_14-Sep-2015.txt

So when I rsync I only want to sync files matching a certain date. This is because we occasionally prune old files from the local system and don't want to keep rsyncing those old files.

Here's what I'm attempting:

gsutil -m rsync -n -x '[0-9]+/[A-Za-z0-9]+_((?!15-Aug-2015).*)' -r gs://bucket folder;

When I tested the regex in a test environment it seems to work fine, but it's still pulling other files that do not match that date.

Any idea why this is not working as expected? Is there a better way to achieve this than rsync?

Upvotes: 0

Views: 974

Answers (2)

Eric Uldall
Eric Uldall

Reputation: 2391

My regexp is correct, but did not work on my version of python/gsutil for some reason. I've found an easier way to achieve the desired result though.

Simply using copy:

gsutil cp -r gs://bucket_name/*/*15-Sep-2015.txt destination_folder

It's just using wildcards, no regexp required.

Note: cp does not seem to be copying the directory structure like rsync, for example...

gs://bucket/123/file.txt gets copied to /destination_folder/file.txt instead of /destination_folder/123/file.txt

UPDATE:

So the copy functionality works as documented and unfortunately seems a bit counter intuitive to me, but I found the working solution using rsync.

I had an extra directory layer that I was not matching so it broke my entire regexp. So it's worth noting that you must match the entire path after the bucket name for the regexp to work.

Working Answer:

gsutil -m rsync -n -x '[0-9]+/[A-Za-z0-9]+_((?!15-Aug-2015).*)' -r gs://bucket/subfolder folder;

Upvotes: 1

m.cekiera
m.cekiera

Reputation: 5395

With (?!15-Aug-2015) regex will match every date, excluding 15-Aug-2015. If you want to match files with particular date, it would be better to use positive lookahead, like:

[0-9]+/[A-Za-z0-9]+_((?=14-Sep-2015)).*

DEMO

But if it is about exclusion, maybe you should add desired format after (?!15-Aug-2015), without it it will match every file which match for [0-9]+/[A-Za-z0-9]+_ which is not followed by excluded part (demo for invalid match examples). To avoid that try:

[0-9]+/[A-Za-z0-9]+_((?!15-Sep-2015))\d{2}-[A-Za-z]{3}-\d{4}

DEMO

Upvotes: 2

Related Questions