joasa
joasa

Reputation: 976

Python : Read all files as gcs_uri in google cloud storage

I am new to the google cloud platform and I have this issue: In my google storage bucket I have 5 folders each one containing 100 audio files (.wav), and I want to access each one of them and then convert speech-to-text.

I have managed to do the second part using google's speech-to-text api, but only for a specific gcs_uri path:

(e.g. gcs_uri ="gs://my_bucket/1/6965842449357946277.wav")

I want to be able and use all 500 wav files as gcs_uri but I'm not sure how to do this by iterating through every single wav file in every bucket. I tried this so far:

from google.cloud import speech_v1p1beta1 as speech
from google.cloud import storage

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="C:/Users/me/project/key.json"

client = speech.SpeechClient()

bucket1 = storage.Client().bucket("gs://my_bucket/1")
bucket2 = storage.Client().bucket("gs://my_bucket/2")
bucket3 = storage.Client().bucket("gs://my_bucket/3")
bucket4 = storage.Client().bucket("gs://my_bucket/4")
bucket5 = storage.Client().bucket("gs://my_bucket/5")

print("Bucket name: {}".format(bucket1))

blobs = bucket1.list_blobs()
print("Blob name: {}".format(blobs))

*** Bucket name: <Bucket: gs://my_bucket/1>
*** Blob name: <google.api_core.page_iterator.HTTPIterator object at 0x000002283FC4AAF0> *

Can anyone help?

Upvotes: 1

Views: 3871

Answers (1)

CaioT
CaioT

Reputation: 2211

Create a function passing the name of the bucket and then iterate through using list_blobs method, example:

def hello_gcs(bucket_name):
    client = storage.Client()
    bucket = client.bucket(bucket_name)
    blobs = client.list_blobs(bucket_name)

    for blob in blobs:
      blob = bucket.get_blob(blob.name)
      if blob.name.endswith('.wav'):
         print("Blob name is {}".format(blob.name))

Upvotes: 2

Related Questions