Reputation: 45
I need to move large number of files to to S3 with the time-stamps intact (c-time, m-time etc need to be intact => I cannot use the aws s3 sync command) - for which I use the following command:
sudo tar -c --use-compress-program=pigz -f - <folder>/ | aws s3 cp - s3://<bucket>/<path-to-folder>/
When trying to create a tar.gz file using the above command --- for a folder that is 80+GB --- I ran into the following error:
upload failed: - to s3://<bucket>/<path-to-folder>/<filename>.tar.gz An error occurred (InvalidArgument) when calling the UploadPart operation: Part number must be an integer between 1 and 10000, inclusive
Upon researching this --- I found that there is a limit of 68GB for tar files (size of file-size-field in the tar header).
Upon further research - I also found a solution (here) that shows how to create a set of tar.gz files using split:
tar cvzf - data/ | split --bytes=100GB - sda1.backup.tar.gz.
that can later be untar with:
cat sda1.backup.tar.gz.* | tar xzvf -
However - split has a different signature: split [OPTION]... [FILE [PREFIX]]
...So - the obvious solution :
sudo tar -c --use-compress-program=pigz -f - folder/ | split --bytes=20GB - prefix.tar.gz. | aws s3 cp - s3://<bucket>/<path-to-folder>/
...will not work - since split uses the prefix as a string and writes the output to a file with that set of names.
Question is: Is there a way to code this such that I an effectively use a pipe'd solution (ie., not use additional disk-space) and yet get a set of files (called prefix.tar.gz.aa, prefix.tar.gz.ab etc) in S3?
Any pointers would be helpful.
--PK
Upvotes: 1
Views: 1273
Reputation: 952
That looks like a non-trivial challenge. Pseudo-code might look like this:
# Start with an empty list
list = ()
counter = 1
foreach file in folder/ do
if adding file to list exceeds tar or s3 limits then
# Flush current list of files to S3
write list to tmpfile
run tar czf - --files-from=tmpfile | aws s3 cp - s3://<bucket>/<path-to-file>.<counter>
list = ()
counter = counter + 1
end if
add file to list
end foreach
if list non-empty
write list to tmpfile
run tar czf - --files-from=tmpfile | aws s3 cp - s3://<bucket>/<path-to-file>.<counter>
end if
This uses the --files-from
option of tar to avoid needing to pass individual files as command arguments and running into limitations there.
Upvotes: 1