Christina
Christina

Reputation: 73

How to upload a dataframe to Google Cloud Storage(bucket) on Python 3?

I want to create a Cloud Function (which shall be executed daily at 01:00). The function should

  1. generate a dataframe
  2. [export as dataframe.csv] <---- not sure if required
  3. push the dataframe(or .csv) to a bucket

.....

Updated code now: (still giving error)

def push_cars( data ):    ##  <<----- not sure how many paramter &why??

    import requests
    import pandas as pd
    import os
    from datetime import datetime

    from google.cloud.storage.blob import Blob
    from google.cloud import storage
    #import csv               # <<--- not sure if required???


    cars_dict = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
        'Price': [22000,25000,27000,35000]
        }

    cars = pd.DataFrame(cars_dict, columns = ['Brand', 'Price'])

    timestamp = datetime.now().strftime("%Y_%m_%d-%H_%M_%S")
    name = "cars_" + timestamp + ".csv"

    cars.to_csv(  "/tmp/test.csv" ,index=False)
    with open('/tmp/test.csv', "w") as csv: 
      csv.write(name) 

    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "My-project.json"

    target_bucket = 'cars:python_gogo'


    storage_client = storage.Client()
    bucket         = storage_client.get_bucket(  target_bucket )
    data           = bucket        .blob(        name_output   )


For replication on the cloud, you need to create a requirements.txt with following content:

requests
pandas
google-cloud-storage
datetime

In the cloud shell, i am using following to deploy this CF: gcloud functions deploy push_cars--entry-point=push_cars--runtime=python37 --memory=1024MB --region=us-east1 --allow-unauthenticated --trigger-http

Upvotes: 0

Views: 4226

Answers (2)

Minesh Barot
Minesh Barot

Reputation: 97

Use the df.to_csv('file path') to directly save the CSV in the cloud storage bucket. Put your gcs bucket route in place of the file path.

For example - df.to_csv('gs://bucketname/filename.csv')

Upvotes: 3

Soni Sol
Soni Sol

Reputation: 2612

Question 1:

The dataframe cannot be written directly to Cloud Storage, it is needed to be a file( can be the .csv you mentioned) and then you can write the file to the Google Cloud Storage bucket. This means that Step 2 is required.

Question 2:

Once you have dataframe.csv saved in /tmp you can transfer it to the Google Cloud Storage buket.

The code implementing both things will be something like this:

def push_cars( data, context ):

    import requests
    import pandas as pd
    import os
    from datetime import datetime

    from google.cloud.storage.blob import Blob
    from google.cloud import storage


    cars_dict = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
        'Price': [22000,25000,27000,35000]
        }

    cars = pd.DataFrame(cars_dict, columns = ['Brand', 'Price'])

    timestamp = datetime.now().strftime("%Y_%m_%d-%H_%M_%S")
    name = "cars_" + timestamp + ".csv"

    cars.to_csv(  cars ,index=False)
    with open('/tmp/test.csv', "w") as csv: 
      csv.write(cars) 

    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "My-project.json"

    target_bucket = 'sp500_python_gogo'

    storage_client = storage.Client()
    bucket         = storage_client.get_bucket(  target_bucket )
    with open('/tmp/test.csv', 'r') as file_obj:
      upload_blob(target_bucket, file_obj, name)

Upvotes: 1

Related Questions