Racana
Racana

Reputation: 327

Google Cloud - Download large file from web

I'm trying to download GhTorrent dump from http://ghtorrent-downloads.ewi.tudelft.nl/mysql/mysql-2020-07-17.tar.gz which is about 127gb

I tried in the cloud but after 6gb it stops, I believe that there is a size limit for using curl

curl http://ghtorrent... | gsutil cp - gs://MY_BUCKET_NAME/mysql-2020-07-17.tar.gz

I cannot use Data Transfer as I need to specify the url, size in bytes (which I have) and hash MD5 which I don't have and I only can generate by having the file in my disk. I think(?)

Is there any other option to download and upload the file directly to the cloud? My total disk size is 117gb sad beep

Upvotes: 1

Views: 1165

Answers (1)

Neo Anderson
Neo Anderson

Reputation: 6350

Worked for me with Storage Transfer Service: https://console.cloud.google.com/transfer/

Have a look on the pricing before moving TBs especially if your target is nearline/coldline: https://cloud.google.com/storage-transfer/pricing


Simple example that copies a file from a public url, to my bucket using a Transfer Job:

  • Create a file theTsv.tsv and specify the complete list of files that must be copied. This example contains just one file:
TsvHttpData-1.0
http://public-url-pointint-to-the-file
  • Upload the theTsv.tsv file to your bucket or any publicly accessible url. In this example I am storing my .tsv file on my bucket https://storage.googleapis.com/<my-bucket-name>/theTsv.tsv
  • Create a transfer job - List of object URLs
    • Add the url that points to the theTsv.tsv file in the URL of TSV file field;
  • Select the target bucket
  • Run immediately

enter image description here

enter image description here

My file, named MD5SUB was copied from the source url into my bucket, under an identical directory structure.

Upvotes: 3

Related Questions