user1945821
user1945821

Reputation: 161

Is it possible to exclude from aws S3 sync files older then x time?

I'm trying to use aws s3 CLI command to sync files (then delete a local copy) from the server to S3 bucket, but can't find a way to exclude newly created files which are still in use in local machine. Any ideas?

Upvotes: 4

Views: 4585

Answers (3)

Rick Yorgason
Rick Yorgason

Reputation: 1666

This should work:

find /path/to/local/SyncFolder -mtime +1 -print0 | sed -z 's/^/--include=/' | xargs -0 /usr/bin/aws s3 sync /path/to/local/SyncFolder s3://remote.sync.folder --exclude '*'

There's a trick here: we're not excluding the files we don't want, we're excluding everything and then including the files we want. Why? Because either way, we're probably going to have too many parameters to fit into the command line. We can use xargs to split up long lines into multiple calls, but we can't let xargs split up our excludes list, so we have to let it split up our includes list instead.

So, starting from the beginning, we have a find command. -mtime +1 finds all the files that are older than a day, and -print0 tells find to delimit each result with a null byte instead of a newline, in case some of your files have newlines in their names.

Next, sed adds the --include=/ option to the start of each filename, and the -z option is included to let sed know to use null bytes instead of newlines as delimiters.

Finally, xargs will feed all those include options to the end of our aws command, calling aws multiple times if need be. The -0 option is just like sed's -z option, telling it to use null bytes instead of newlines.

Upvotes: 1

kenorb
kenorb

Reputation: 166457

Most likely ignoring the newer files is the default behavior. We can read in aws s3 sync help:

The default behavior is to ignore same-sized items unless the local version is newer than the S3 version.

If you'd like to change the default behaviour, you've the following parameters to us:

  • --size-only (boolean) Makes the size of each key the only criteria used to decide whether to sync from source to destination.

  • --exact-timestamps (boolean) When syncing from S3 to local, same-sized items will be ignored only when the timestamps match exactly. The default behavior is to ignore same-sized items unless the local version is newer than the S3 version.

To see what files are going to be updated, run the sync with --dryrun.

Alternatively use find to list all the files which needs to be excluded, and pass it into --exclude parameter.

Upvotes: 0

reineckm
reineckm

Reputation: 56

To my knowledge you can only Include/ Exclude based on Filename. So the only way I see is a realy dirty hack. You could run a bash script to rename all files below your treshhold and prefix/ postfix them like TOO_NEW_%Filename% and run cli like:

--exclude 'TOO_NEW_*'

But no don't do that.

Upvotes: 0

Related Questions