claudiadast
claudiadast

Reputation: 429

Uploading multiple files to Google Cloud Storage via Python Client Library

The GCP python docs have a script with the following function:

def upload_pyspark_file(project_id, bucket_name, filename, file):
      """Uploads the PySpark file in this directory to the configured
      input bucket."""
      print('Uploading pyspark file to GCS')
      client = storage.Client(project=project_id)
      bucket = client.get_bucket(bucket_name)
      blob = bucket.blob(filename)
      blob.upload_from_file(file)

I've created an argument parsing function in my script that takes in multiple arguments (file names) to upload to a GCS bucket. I'm trying to adapt the above function to parse those multiple args and upload those files, but am unsure how to proceed. My confusion is with the 'filename' and 'file' variables above. How can I adapt the function for my specific purpose?

Upvotes: 1

Views: 4679

Answers (1)

Walaitki
Walaitki

Reputation: 165

I don't suppose you're still looking for something like this?

from google.cloud import storage
import os

files = os.listdir('data-files')
client = storage.Client.from_service_account_json('cred.json')
bucket = client.get_bucket('xxxxxx')


def upload_pyspark_file(filename, file):
    # """Uploads the PySpark file in this directory to the configured
    # input bucket."""
    # print('Uploading pyspark file to GCS')
    # client = storage.Client(project=project_id)
    # bucket = client.get_bucket(bucket_name)
    print('Uploading from ', file, 'to', filename)
    blob = bucket.blob(filename)
    blob.upload_from_file(file)


for f in files:
    upload_pyspark_file(f, "data-files\\{0}".format(f))

The difference between file and filename is as you may have guessed, file is the source file and filename is the destination file.

Upvotes: 2

Related Questions