Reputation: 5184
I have a storage bucket in google cloud. I have a few directories which I created with files in them.
I know that if I want to cycle through all the files in one of the directories, I can use the following command:
for file in list(source_bucket.list_blobs(prefix='subdir/subdir2')):
file_path=f"gs://{file.bucket.name}/{file.name}"
print(file_path)
However, the result includes the actual path that I am trying to cycle through,
gs://bucket-name/subdir/subdir2 <----- this item
gs://bucket-name/subdir/subdir2/file1
gs://bucket-name/subdir/subdir2/file2
....
Is there a way to cycle through the directory without having the directory appear so that it looks like this.
gs://bucket-name/subdir/subdir2/file1
gs://bucket-name/subdir/subdir2/file2
....
I managed to do this:
subdir = 'subdir1/subdir2/'
for file in list(source_bucket.list_blobs(prefix=subdir)):
file_path = f"gs://{file.bucket.name}/{file.name}"
if file.name == subdir:
continue
else:
print(file_path)
But is there a cleaner way to do it using the google storage api? I tried to look up the documentation but I don't see anything like that.
Upvotes: 1
Views: 1000
Reputation: 821
Cloud Storage does not actually have directories, it is a flat structure. The Console is just making it look like a hierarchical structure by naming the objects with a pattern similar to a file system. So when you request all the objects in a specific "folder" you are just requesting all objects that start with the same prefix, thus you are getting the entire "sub-hierarchy" as a result.
You can check https://cloud.google.com/storage/docs/naming-objects for more information. This is the relevant bit:
Object names reside in a flat namespace within a bucket. This means that:
Different buckets can have objects with the same name. Objects do not reside within subdirectories in a bucket.
For example, you can name an object /europe/france/paris.jpg to make it appear that paris.jpg resides in the subdirectory /europe/france, but to Cloud Storage, the object simply exists in the bucket and has the name /europe/france/paris.jpg. As a result, while deeply nested, directory-like structures using slash delimiters are possible within Cloud Storage, they don't have the performance that a native filesystem has when listing deeply nested sub-directories.
Upvotes: 3