Reputation: 125
I have a cloud function that is meant to create a CSV from an API call and then send that CSV to Cloud Storage.
Here is my code:
import requests
import pprint
import pandas as pd
from flatsplode import flatsplode
import csv
import datetime
import schedule
import time
import json
import numpy as np
import os
import tempfile
from google.cloud import storage
api_url = 'https://[YOUR_DOMAIN].com/api/v2/[API_KEY]/keywords/list?site_id=[SITE_ID][&start={start}][&results=100]&format=json'
def export_data(url):
response = requests.get(url) # Make a GET request to the URL
payload = response.json() # Parse `response.text` into JSON
pp = pprint.PrettyPrinter(indent=1)
# Use the flatsplode package to quickly turn the JSON response to a DF
new_list = pd.DataFrame(list(flatsplode(payload)))
# Drop certain columns from the DF
idx = np.r_[1:5,14:27,34,35]
new_list = new_list.drop(new_list.columns[idx], axis=1)
# Create a csv and load it to google cloud storage
new_list = new_list.to_csv('/tmp/temp.csv')
def upload_blob(bucket_name, source_file_name, destination_blob_name):
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_file(source_file_name)
message = "Data for CSV file" # ERROR HERE
csv = open(new_list, "w")
csv.write(message)
with open(new_list, 'r') as file_obj:
upload_blob('data-exports', file_obj, 'data-' + str(datetime.date.today()) + '.csv')
export_data(api_url)
I attempted to have the file in the /tmp
format to allow me to write it to storage but haven't had much success. The API call works like a charm and I am able to get a CSV locally. The upload to Cloud Storage is where I get the error.
Any help is much appreciated!
Upvotes: 2
Views: 2029
Reputation: 53381
Instead of trying using temporary storage in your cloud functions, try converting to string your dataframe and upload the result to Google Cloud Storage.
Consider for instance:
import requests
import pprint
import pandas as pd
from flatsplode import flatsplode
import csv
import datetime
import schedule
import time
import json
import numpy as np
import os
import tempfile
from google.cloud import storage
api_url = 'https://[YOUR_DOMAIN].com/api/v2/[API_KEY]/keywords/list?site_id=[SITE_ID][&start={start}][&results=100]&format=json'
def export_data(url):
response = requests.get(url) # Make a GET request to the URL
payload = response.json() # Parse `response.text` into JSON
pp = pprint.PrettyPrinter(indent=1)
# Use the flatsplode package to quickly turn the JSON response to a DF
new_list = pd.DataFrame(list(flatsplode(payload)))
# Drop certain columns from the DF
idx = np.r_[1:5,14:27,34,35]
new_list = new_list.drop(new_list.columns[idx], axis=1)
# Convert your df to str: it is straightforward, just do not provide
# any value for the first param path_or_buf
csv_str = new_list.to_csv()
# Then, upload it to cloud storage
def upload_blob(bucket_name, data, destination_blob_name):
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
# Note the use of upload_from_string here. Please, provide
# the appropriate content type if you wish
blob.upload_from_string(data, content_type='text/csv')
upload_blob('data-exports', csv_str, 'data-' + str(datetime.date.today()) + '.csv')
export_data(api_url)
Upvotes: 3
Reputation: 15432
From what I can tell, you've got a couple issues here.
First up, pd.to_csv
does not return anything if a filepath or buffer is provided as an argument. So this line writes the file, but also assigns the value None
to new_list
.
new_list = new_list.to_csv('/tmp/temp.csv')
To fix this, simply drop the assignment - you only need the new_list.to_csv('/tmp/tmp.csv')
line.
This first error is causing the problem later on, because you can't write a CSV to the location None
. Instead, provide a string as the argument to open
. Also, if you use the open mode 'w'
, the CSV data will be overwritten. What's the format you're going for here? Do you mean to append to the file, with 'a'
?
message = "Data for CSV file" # ERROR HERE
csv = open(new_list, "w")
csv.write(message)
Finally, you're providing a file object where a string is expected, this time to the upload_blob
function's source_file_name
argument.
with open(new_list, 'r') as file_obj:
upload_blob('data-exports', file_obj, 'data-' + str(datetime.date.today()) + '.csv')
I think here you can skip the file open and just pass the path to the file as the second argument.
Upvotes: 0