Alex Fuss
Alex Fuss

Reputation: 125

Cloud Function Sending CSV to Cloud Storage

I have a cloud function that is meant to create a CSV from an API call and then send that CSV to Cloud Storage.

Here is my code:

import requests
import pprint
import pandas as pd
from flatsplode import flatsplode
import csv
import datetime
import schedule
import time
import json
import numpy as np
import os
import tempfile
from google.cloud import storage

api_url = 'https://[YOUR_DOMAIN].com/api/v2/[API_KEY]/keywords/list?site_id=[SITE_ID][&start={start}][&results=100]&format=json'

def export_data(url):
    response = requests.get(url)  # Make a GET request to the URL
    payload = response.json() # Parse `response.text` into JSON
    pp = pprint.PrettyPrinter(indent=1)

    # Use the flatsplode package to quickly turn the JSON response to a DF
    new_list = pd.DataFrame(list(flatsplode(payload)))

    # Drop certain columns from the DF
    idx = np.r_[1:5,14:27,34,35]
    new_list = new_list.drop(new_list.columns[idx], axis=1)

    # Create a csv and load it to google cloud storage
    new_list = new_list.to_csv('/tmp/temp.csv')
    def upload_blob(bucket_name, source_file_name, destination_blob_name):

        storage_client = storage.Client()
        bucket = storage_client.get_bucket(bucket_name)
        blob = bucket.blob(destination_blob_name)
        blob.upload_from_file(source_file_name)

    message = "Data for CSV file"    # ERROR HERE
    csv = open(new_list, "w")
    csv.write(message)
    with open(new_list, 'r') as file_obj:
        upload_blob('data-exports', file_obj, 'data-' + str(datetime.date.today()) + '.csv')

export_data(api_url)

I attempted to have the file in the /tmp format to allow me to write it to storage but haven't had much success. The API call works like a charm and I am able to get a CSV locally. The upload to Cloud Storage is where I get the error.

Any help is much appreciated!

Upvotes: 2

Views: 2029

Answers (2)

jccampanero
jccampanero

Reputation: 53381

Instead of trying using temporary storage in your cloud functions, try converting to string your dataframe and upload the result to Google Cloud Storage.

Consider for instance:

import requests
import pprint
import pandas as pd
from flatsplode import flatsplode
import csv
import datetime
import schedule
import time
import json
import numpy as np
import os
import tempfile
from google.cloud import storage

api_url = 'https://[YOUR_DOMAIN].com/api/v2/[API_KEY]/keywords/list?site_id=[SITE_ID][&start={start}][&results=100]&format=json'

def export_data(url):
    response = requests.get(url)  # Make a GET request to the URL
    payload = response.json() # Parse `response.text` into JSON
    pp = pprint.PrettyPrinter(indent=1)

    # Use the flatsplode package to quickly turn the JSON response to a DF
    new_list = pd.DataFrame(list(flatsplode(payload)))

    # Drop certain columns from the DF
    idx = np.r_[1:5,14:27,34,35]
    new_list = new_list.drop(new_list.columns[idx], axis=1)

    # Convert your df to str: it is straightforward, just do not provide
    # any value for the first param path_or_buf
    csv_str = new_list.to_csv()

    # Then, upload it to cloud storage
    def upload_blob(bucket_name, data, destination_blob_name):

        storage_client = storage.Client()
        bucket = storage_client.get_bucket(bucket_name)
        blob = bucket.blob(destination_blob_name)
        # Note the use of upload_from_string here. Please, provide
        # the appropriate content type if you wish
        blob.upload_from_string(data, content_type='text/csv')

    upload_blob('data-exports', csv_str, 'data-' + str(datetime.date.today()) + '.csv')

export_data(api_url)

Upvotes: 3

Michael Delgado
Michael Delgado

Reputation: 15432

From what I can tell, you've got a couple issues here.

First up, pd.to_csv does not return anything if a filepath or buffer is provided as an argument. So this line writes the file, but also assigns the value None to new_list.

new_list = new_list.to_csv('/tmp/temp.csv')

To fix this, simply drop the assignment - you only need the new_list.to_csv('/tmp/tmp.csv') line.

This first error is causing the problem later on, because you can't write a CSV to the location None. Instead, provide a string as the argument to open. Also, if you use the open mode 'w', the CSV data will be overwritten. What's the format you're going for here? Do you mean to append to the file, with 'a'?

message = "Data for CSV file"    # ERROR HERE
csv = open(new_list, "w")
csv.write(message)

Finally, you're providing a file object where a string is expected, this time to the upload_blob function's source_file_name argument.


    with open(new_list, 'r') as file_obj:
        upload_blob('data-exports', file_obj, 'data-' + str(datetime.date.today()) + '.csv')

I think here you can skip the file open and just pass the path to the file as the second argument.

Upvotes: 0

Related Questions