How to tar files with a size limit and write to a remote location?

Question

I need to move large number of files to to S3 with the time-stamps intact (c-time, m-time etc need to be intact => I cannot use the aws s3 sync command) - for which I use the following command:

sudo tar -c --use-compress-program=pigz -f - / |  aws s3 cp - s3:////

When trying to create a tar.gz file using the above command --- for a folder that is 80+GB --- I ran into the following error:

upload failed: - to s3:////.tar.gz An error occurred (InvalidArgument) when calling the UploadPart operation: Part number must be an integer between 1 and 10000, inclusive

Upon researching this --- I found that there is a limit of 68GB for tar files (size of file-size-field in the tar header).

Upon further research - I also found a solution (here) that shows how to create a set of tar.gz files using split:

tar cvzf - data/ | split --bytes=100GB - sda1.backup.tar.gz.

that can later be untar with:

cat sda1.backup.tar.gz.* | tar xzvf -

However - split has a different signature: split [OPTION]... [FILE [PREFIX]]

...So - the obvious solution :

sudo tar -c --use-compress-program=pigz -f - folder/ | split --bytes=20GB - prefix.tar.gz. | aws s3 cp - s3:////

...will not work - since split uses the prefix as a string and writes the output to a file with that set of names.

Question is: Is there a way to code this such that I an effectively use a pipe'd solution (ie., not use additional disk-space) and yet get a set of files (called prefix.tar.gz.aa, prefix.tar.gz.ab etc) in S3?

Any pointers would be helpful.

--PK

How to tar files with a size limit and write to a remote location?

Answers (1)

Related Questions