JSnow
JSnow

Reputation: 1029

Downloading folders from Google Cloud Storage Bucket

I'm new to Google Cloud Platform.I have trained my model on datalab and saved the model folder on cloud storage in my bucket. I'm able to download the existing files in the bucket to my local machine by doing right-click on the file --> save as link. But when I try to download the folder by the same procedure as above, I'm not getting the folder but its image. Is there anyway I can download the whole folder and its contents as it is? Is there any gsutil command to copy folders from cloud storage to local directory?

Upvotes: 38

Views: 74963

Answers (7)

Yi Han
Yi Han

Reputation: 21

As of Mar. 2022, the gs path needs to be double quoted. You can actually find the proper downloading command by navigating to the bucket root, check one of the dir and click Download on the top.

Upvotes: 2

Tokci
Tokci

Reputation: 1280

If you are downloading using data from google cloud storage using python and want to maintain same folder structure , follow this code i wrote in python.

OPTION 1

from google.cloud import storage

def findOccurrences(s, ch): # to find position of '/' in blob path ,used to create folders in local storage
    return [i for i, letter in enumerate(s) if letter == ch]

def download_from_bucket(bucket_name, blob_path, local_path):    
    # Create this folder locally
    if not os.path.exists(local_path):
        os.makedirs(local_path)        

    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucket_name)
    blobs=list(bucket.list_blobs(prefix=blob_path))

    startloc = 0
    for blob in blobs:
        startloc = 0
        folderloc = findOccurrences(blob.name.replace(blob_path, ''), '/') 
        if(not blob.name.endswith("/")):
            if(blob.name.replace(blob_path, '').find("/") == -1):
                downloadpath=local_path + '/' + blob.name.replace(blob_path, '')
                logging.info(downloadpath)
                blob.download_to_filename(downloadpath)
            else:
                for folder in folderloc:
                    
                    if not os.path.exists(local_path + '/' + blob.name.replace(blob_path, '')[startloc:folder]):
                        create_folder=local_path + '/' +blob.name.replace(blob_path, '')[0:startloc]+ '/' +blob.name.replace(blob_path, '')[startloc:folder]
                        startloc = folder + 1
                        os.makedirs(create_folder)
                    
                downloadpath=local_path + '/' + blob.name.replace(blob_path, '')

                blob.download_to_filename(downloadpath)
                logging.info(blob.name.replace(blob_path, '')[0:blob.name.replace(blob_path, '').find("/")])

    logging.info('Blob {} downloaded to {}.'.format(blob_path, local_path))


bucket_name = 'google-cloud-storage-bucket-name' # do not use gs://
blob_path = 'training/data' # blob path in bucket where data is stored 
local_dir = 'local-folder name' #trainingData folder in local
download_from_bucket(bucket_name, blob_path, local_dir)

OPTION 2: using gsutil sdk One more option of doing it via python program is below.

def download_bucket_objects(bucket_name, blob_path, local_path):
    # blob path is bucket folder name
    command = "gsutil cp -r gs://{bucketname}/{blobpath} {localpath}".format(bucketname = bucket_name, blobpath = blob_path, localpath = local_path)
    os.system(command)
    return command

OPTION 3 - No python ,directly using terminal and google SDK Prerequisites: Google Cloud SDK is installed and initialized ($ glcoud init) Refer to below link for commands:

https://cloud.google.com/storage/docs/gsutil/commands/cp

Upvotes: 6

njmwas
njmwas

Reputation: 1231

This is how you can download a folder from Google Cloud Storage Bucket

Run the following commands to download it from the bucket storage to your Google Cloud Console local path

gsutil -m cp -r gs://{bucketname}/{folderPath} {localpath}

once you run that command, confirm that your folder is on the localpath by running ls command to list files and directories on the localpath

Now zip your folder by running the command below

zip -r foldername.zp yourfolder/*

Once the zip process is done, click on the more dropdown menu at the right side of the Google Cloud Console,

Google Cloud Console Menu

then select "Download file" Option. You will be prompted to enter the name of the file that you want to download, enter the name of the zip file - "foldername.zp"

Upvotes: 21

Ankit Veer Singh
Ankit Veer Singh

Reputation: 163

Here's the code I wrote. This Will download the complete directory structure to your VM/local storage .

from google.cloud import storage
import os
bucket_name = "ar-data"
    
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)

dirName='Data_03_09/' #***folder in bucket whose content you want to download
blobs = bucket.list_blobs(prefix = dirName)#, delimiter = '/')
destpath=r'/home/jupyter/DATA_test/' #***path on your vm/local where you want to download the bucket directory
for blob in blobs:
    #print(blob.name.lstrip(dirName).split('/'))
    currpath=destpath
    if not os.path.exists(os.path.join(destpath,'/'.join(blob.name.lstrip(dirName)).split('/')[:-1])):
        for n in blob.name.lstrip(dirName).split('/')[:-1]:
            currpath=os.path.join(currpath,n)
            if not os.path.exists(currpath):
                print('creating directory- ', n , 'On path-', currpath)
                os.mkdir(currpath)
    print("downloading ... ",blob.name.lstrip(dirName))
    blob.download_to_filename(os.path.join(destpath,blob.name.lstrip(dirName)))

or simply use in terminal :

gsutil -m cp -r gs://{bucketname}/{folderPath} {localpath}

Upvotes: 0

Pratap Singh
Pratap Singh

Reputation: 419

gsutil -m cp -r gs://bucket-name "{path to local existing folder}"

Works for sure.

Upvotes: 1

Digimix
Digimix

Reputation: 323

Prerequisites: Google Cloud SDK is installed and initialized ($ glcoud init)

Command:

gsutil -m cp -r  gs://bucket-name .

This will copy all of the files using multithread which is faster. I found that the "dir" command instructed for use in the official Gsutil Docs did not work.

Upvotes: 14

Matthias Baetens
Matthias Baetens

Reputation: 1553

You can find docs on the gsutil tool here and for your question more specifically here.

The command you want to use is:

gsutil cp -r gs://bucket/folder .

Upvotes: 51

Related Questions