Reputation: 393
I would like to only get the first level of a fake folder structure on GCS.
If I run e.g.:
gsutil ls 'gs://gcp-public-data-sentinel-2/tiles/'
I get a list like this:
gs://gcp-public-data-sentinel-2/tiles/01/
gs://gcp-public-data-sentinel-2/tiles/02/
gs://gcp-public-data-sentinel-2/tiles/03/
gs://gcp-public-data-sentinel-2/tiles/04/
gs://gcp-public-data-sentinel-2/tiles/05/
gs://gcp-public-data-sentinel-2/tiles/06/
gs://gcp-public-data-sentinel-2/tiles/07/
gs://gcp-public-data-sentinel-2/tiles/08/
gs://gcp-public-data-sentinel-2/tiles/09/
gs://gcp-public-data-sentinel-2/tiles/10/
gs://gcp-public-data-sentinel-2/tiles/11/
gs://gcp-public-data-sentinel-2/tiles/12/
gs://gcp-public-data-sentinel-2/tiles/13/
gs://gcp-public-data-sentinel-2/tiles/14/
gs://gcp-public-data-sentinel-2/tiles/15/
.
.
.
Running code like the following in the Python API give me an empty result:
from google.cloud import storage
bucket_name = 'gcp-public-data-sentinel-2'
prefix = 'tiles/'
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
for blob in bucket.list_blobs(max_results=10, prefix=prefix,
delimiter='/'):
print blob.name
If I don't use the delimiter
option I get all the results in the bucket which is not very useful.
Upvotes: 7
Views: 16691
Reputation: 841
Here is a faster way (found this in this GitHub thread, posted by @evanj):
def list_gcs_directories(bucket, prefix):
iterator = bucket.list_blobs(prefix=prefix, delimiter='/')
prefixes = set()
for page in iterator.pages:
print(page, page.prefixes)
prefixes.update(page.prefixes)
return prefixes
You want to call this function as follows:
client = storage.Client()
bucket_name = 'my_bucket_name'
bucket_obj = client.bucket(bucket_name)
list_folders = list_gcs_directories(bucket_obj,
prefix='my/prefix/path/within/bucket/')
# Getting rid of the prefix
list_folders = [''.join(indiv_folder.split('/')[-1])
for indiv_folder in list_folders]
Upvotes: 2
Reputation: 305
Here is the right answer that works
To achieve the simple listing of a directory also called as a blob in google storage bucket.
Sample Link: 'gs://BUCKET_A/FOLDER_1/FOLDER_2/FILE_10.txt'
Function to be used: list_blobs
.
Parameters required to be passed to the list_blobs
bucket_name
- Name of the storage bucket. Example: "BUCKET_A"prefix
- Example: "FOLDER_1/FOLDER_2"delimiter
- The listing shouldn't exceed beyond the character passed to this. For simple listing, the delimiter has to be '/'
. Meaning, the folders path for the next hierarchy has to cross '/'
and so they will be ignored while traversing by the API implementation.Sample code
storage_client = storage.Client()
# Note: Client.list_blobs requires at least package version 1.17.0.
blobs = storage_client.list_blobs(bucket_name, prefix=prefix, delimiter=delimiter)
# Note: The call returns a response only when the iterator is consumed.
print("Blobs:")
for blob in blobs:
print(blob.name)
if delimiter:
print("Prefixes:")
for prefix in blobs.prefixes:
print(prefix)
To achieve what we need:
"/"
."/"
to restrict listing not go beyond current directory.blobs.prefixes
.In Summary,
Access the files by simply iterating the
blobs
. Access the sub-folders by simply iterating theblobs.prefixes
.
Upvotes: 0
Reputation: 3223
If one finds this ticket like me after a long time: currently (google-cloud-storage 2.1.0
) one can list the bucket contents using '//'
instead of '/'
. However, it lists "recursively" down to the actual blob (as it is not a real FS)
Upvotes: 0
Reputation: 3325
Maybe not the best way, but, inspired by this comment on the official repository:
iterator = bucket.list_blobs(delimiter='/', prefix=prefix)
response = iterator._get_next_page_response()
for prefix in response['prefixes']:
print('gs://'+bucket_name+'/'+prefix)
Gives:
gs://gcp-public-data-sentinel-2/tiles/01/
gs://gcp-public-data-sentinel-2/tiles/02/
gs://gcp-public-data-sentinel-2/tiles/03/
gs://gcp-public-data-sentinel-2/tiles/04/
gs://gcp-public-data-sentinel-2/tiles/05/
gs://gcp-public-data-sentinel-2/tiles/06/
gs://gcp-public-data-sentinel-2/tiles/07/
gs://gcp-public-data-sentinel-2/tiles/08/
gs://gcp-public-data-sentinel-2/tiles/09/
gs://gcp-public-data-sentinel-2/tiles/10/
...
Upvotes: 7