S.G.
S.G.

Reputation: 367

Walking a directory tree inside a Google Cloud Platform bucket in Python

For directories on a local machine, the os.walk() method is commonly used for walking a directory tree in Python.

Google has a Python module (google.cloud.storage) for uploading to and downloading from a GCP bucket in a locally-run Python script.

I need a way to walk directory trees in a GCP bucket. I browsed through the classes in the google.cloud Python module, but could not find anything. Is there a way to perform something similar to os.walk() on directories inside a GCP bucket?

Upvotes: 7

Views: 2952

Answers (2)

Alon Barad
Alon Barad

Reputation: 1991

import os
from google.cloud import storage

client = storage.Client()
bucket = client.get_bucket('bucket_name')

for blob in bucket.list_blobs(prefix=''):
   # Download the file
    with open(blob.name, 'wb') as file_obj:
        client.download_blob_to_file(blob, file_obj)

   # You logic on the file
   # logic goes here

   # Remove the local file
   os.remove(blob.name)

Upvotes: 1

Brandon Yarbrough
Brandon Yarbrough

Reputation: 38379

No such function exists in the GCS library. However, GCS can list objects by prefix, which is usually sufficiently equivalent:

from google.cloud import storage

bucket = storage.Client().get_bucket(bucket_name)
for blob in bucket.list_blobs(prefix="dir1/"):
  print(blob.name)

Upvotes: 6

Related Questions