Reputation: 976
I am new to the google cloud platform and I have this issue: In my google storage bucket I have 5 folders each one containing 100 audio files (.wav), and I want to access each one of them and then convert speech-to-text.
I have managed to do the second part using google's speech-to-text api, but only for a specific gcs_uri path:
(e.g. gcs_uri ="gs://my_bucket/1/6965842449357946277.wav"
)
I want to be able and use all 500 wav files as gcs_uri but I'm not sure how to do this by iterating through every single wav file in every bucket. I tried this so far:
from google.cloud import speech_v1p1beta1 as speech
from google.cloud import storage
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="C:/Users/me/project/key.json"
client = speech.SpeechClient()
bucket1 = storage.Client().bucket("gs://my_bucket/1")
bucket2 = storage.Client().bucket("gs://my_bucket/2")
bucket3 = storage.Client().bucket("gs://my_bucket/3")
bucket4 = storage.Client().bucket("gs://my_bucket/4")
bucket5 = storage.Client().bucket("gs://my_bucket/5")
print("Bucket name: {}".format(bucket1))
blobs = bucket1.list_blobs()
print("Blob name: {}".format(blobs))
*** Bucket name: <Bucket: gs://my_bucket/1>
*** Blob name: <google.api_core.page_iterator.HTTPIterator object at 0x000002283FC4AAF0> *
Can anyone help?
Upvotes: 1
Views: 3871
Reputation: 2211
Create a function passing the name of the bucket and then iterate through using list_blobs method, example:
def hello_gcs(bucket_name):
client = storage.Client()
bucket = client.bucket(bucket_name)
blobs = client.list_blobs(bucket_name)
for blob in blobs:
blob = bucket.get_blob(blob.name)
if blob.name.endswith('.wav'):
print("Blob name is {}".format(blob.name))
Upvotes: 2