SARIM
SARIM

Reputation: 1132

How can I use wildcards in my gcp bucket objects path?

My main problem is, I want to check if an object in gcp exists or not. So, what I tried

from google.cloud import storage
client = storage.Client()
path_exists = False
for blob in client.list_blobs('models', prefix='trainedModels/mddeep256_sarim'):
    path_exists = True
    break

It worked fine for me. But now the problem is I don't know the model name which is mddeep256 but I know further part _sarim

So, I want to use something like

for blob in client.list_blobs('models', prefix='trainedModels/*_sarim'):

I want to use * wildcard, how can I do that?

Upvotes: 4

Views: 3821

Answers (2)

Gaurang Shah
Gaurang Shah

Reputation: 12910

list_blob doesn't support regex in prefix. you need filter by yourself as mentioned by Guilaume.

following should work.

def is_object_exist(bucket_name, object_pattern):
    from google.cloud import storage
    import re
    client = storage.Client()
    all_blobs = client.list_blobs(bucket_name)
    regex = re.compile(r'{}'.format(object_pattern))
    filtered_blobs = [b for b in all_blobs if regex.match(b.name)]
    return True if len(filtered_blobs) else False

Upvotes: 4

guillaume blaquiere
guillaume blaquiere

Reputation: 75735

In short: you can't!

You can only filter on the prefix. If you want to filter on the suffix (as you wish), start by filter on the longest prefix that you can with the API, and then iterate in your code to scan the file name and get those that match your pattern.

No built-il solution for that...

Upvotes: 2

Related Questions