Dataflow: storage.Client() import error or how to list all blobs of a GCP bucket

Question

I am have an apache-beam==2.3.0 pipeline written using the python SDK that is working with my DirectRunner locally. When I change the runner to DataflowRunner I get an error about 'storage' not being global.

Checking my code I think it's because I am using the credentials stored in my environment. In my python code I just do:

    class ExtractBlobs(beam.DoFn):
        def process(self, element):
            from google.cloud import storage
            client = storage.Client() 
            yield list(client.get_bucket(element).list_blobs(max_results=100))

The real issue is that I need the client so I can then get the bucket so I can then list the blobs. Everything I'm doing here is so I can list the blobs.

So if anyone can either point me the right direction towards using 'storage.Client()' in Dataflow or how to list the blobs of a GCP bucket without needing the client.

Thanks in advance! [+] What I've read: https://cloud.google.com/docs/authentication/production#auth-cloud-implicit-python

Dataflow: storage.Client() import error or how to list all blobs of a GCP bucket

Answers (1)

Related Questions