Reputation: 11287
I'd like to split and copy a huge file from a bucket (gs://$SRC_BUCKET/$MY_HUGE_FILE
) to another bucket (gs://$DST_BUCKET/
), but without downloading the file locally. I expect to do this using only gsutil
and shell commands.
I'm looking for something with the same final behaviour as the following commands :
gsutil cp gs://$SRC_BUCKET/$MY_HUGE_FILE my_huge_file_stored_locally
split -l 1000000 my_huge_file_stored_locally a_split_of_my_file_
gsutil -m mv a_split_of_my_file_* gs://$DST_BUCKET/
But, because I'm executing these actions on a Compute Engine VM with limited disk storage capacity, getting the huge file locally is not possible (and anyway, it seems like a waste of network bandwidth).
The file in this example is split by number of lines (-l 1000000
), but I will accept answers if the split is done by number of bytes.
I took a look at the docs about streaming uploads and downloads using gsutil to do something like :
gsutil cp gs://$SRC_BUCKET/$MY_HUGE_FILE - | split -1000000 | ...
But I can't figure out how to upload split files directly to gs://$DST_BUCKET/
, without creating them locally (creating temporarily only 1 shard for the transfer is OK though).
Upvotes: 4
Views: 3670
Reputation: 12155
This can't be done without downloading, but you could use range reads to build the pieces without downloading the full file at once, e.g.,
gsutil cat -r 0-10000 gs://$SRC_BUCKET/$MY_HUGE_FILE | gsutil cp - gs://$DST_BUCKET/file1
gsutil cat -r 10001-20000 gs://$SRC_BUCKET/$MY_HUGE_FILE | gsutil cp - gs://$DST_BUCKET/file2
...
Upvotes: 4