Reputation: 1052
I am running the following cloud function. It runs with success and indicates data was loaded to the table. But when I query the BigQuery no data has been added. I am getting no errors and no indication that it isn't working.
from google.cloud import bigquery
import pandas as pd
def download_data(event, context):
df = pd.read_csv('https://covid.ourworldindata.org/data/ecdc/full_data.csv')
# Create an empty list
Row_list =[]
# Iterate over each row
for index, rows in df.iterrows():
# Create list for the current row
my_list =[rows.date, rows.location, rows.new_cases, rows.new_deaths, rows.total_cases, rows.total_deaths]
#print(my_list)
# append the list to the final list
Row_list.append(my_list)
## Get Biq Query Set up
client = bigquery.Client()
table_id = "<project_name>.raw.daily_load"
table = client.get_table(table_id)
print(client)
print(table_id)
print(table)
errors = client.insert_rows(table, Row_list) # Make an API request.
if errors == []:
print("New rows have been added.")
Attempted so far;
Not sure what is next, any advice would be greatly appreciated
Maybe this post in Google Cloud documentation could help.
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#loading_csv_data_into_a_table
Upvotes: 3
Views: 15934
Reputation: 1485
You can directly stream the data from the website to BigQuery using Cloud Functions but the data should be clean and conform to BigQuery standards else the e insertion will fail. One more point to note is that the dataframe columns must match the table columns for the data to be successfully inserted. I tested this out and saw insertion errors returned by the client when the column names didn’t match.
Writing the function
I have created a simple Cloud Function using the documentation and pandas example. The dependencies that need to be included are google-cloud-bigquery and pandas.
main.py
from google.cloud import bigquery
import pandas as pd
def hello_gcs(event,context):
df = pd.read_csv('https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv')
df.set_axis(["Month", "Year_1", "Year_2", "Year_3"], axis=1, inplace=True) ## => Rename the columns if necessary
table_id = "project.dataset.airtravel"
## Get BiqQuery Set up
client = bigquery.Client()
table = client.get_table(table_id)
errors = client.insert_rows_from_dataframe(table, df) # Make an API request.
if errors == []:
print("Data Loaded")
return "Success"
else:
print(errors)
return "Failed"
requirements.txt
# Function dependencies, for example:
# package>=version
google-cloud-bigquery
pandas
Now you can directly deploy the function.
Output
Output Table
Upvotes: 3
Reputation: 3794
Assuming that the App Engine default service account has the default Editor role assigned and that you have a very simple schema for the BigQuery table. For example:
Field name Type Mode Policy tags Description
date STRING NULLABLE
location STRING NULLABLE
new_cases INTEGER NULLABLE
new_deaths INTEGER NULLABLE
total_cases INTEGER NULLABLE
total_deaths INTEGER NULLABLE
The following modification of your code should work for an HTTP triggered function. Notice that you were not including the Row_list.append(my_list)
in the for loop to populate your list with the elements and that according to the samples on the documentation you should be using a list of tuples:
from google.cloud import bigquery
import pandas as pd
client = bigquery.Client()
table_id = "[PROJECT-ID].[DATASET].[TABLE]"
def download_data(request):
df = pd.read_csv('https://covid.ourworldindata.org/data/ecdc/full_data.csv')
# Create an empty list
Row_list =[]
# Iterate over each row
for index, rows in df.iterrows():
# Create list for the current row
my_list =(rows.date, rows.location, rows.new_cases, rows.new_deaths, rows.total_cases, rows.total_deaths)
# append the list to the final list
Row_list.append(my_list)
## Get Biq Query Set up
table = client.get_table(table_id)
errors = client.insert_rows(table, Row_list) # Make an API request.
if errors == []:
print("New rows have been added.")
With the very simple requirements.txt file:
# Function dependencies, for example:
# package>=version
pandas
google-cloud-bigquery
Upvotes: 0