GCP Cloud Function to write data to BigQuery runs with success but data doesn't appear in BigQuery table

I am running the following cloud function. It runs with success and indicates data was loaded to the table. But when I query the BigQuery no data has been added. I am getting no errors and no indication that it isn't working.

from google.cloud import bigquery
import pandas as pd


def download_data(event, context):
     
     df = pd.read_csv('https://covid.ourworldindata.org/data/ecdc/full_data.csv')

     # Create an empty list 
     Row_list =[] 
     
     # Iterate over each row 
     for index, rows in df.iterrows(): 
          # Create list for the current row 
          my_list =[rows.date, rows.location, rows.new_cases, rows.new_deaths, rows.total_cases, rows.total_deaths] 
          #print(my_list)     
     # append the list to the final list 
     Row_list.append(my_list) 


     ## Get Biq Query Set up
     client = bigquery.Client()
     table_id = "<project_name>.raw.daily_load"
     table = client.get_table(table_id)

     print(client)
     print(table_id)
     print(table)


     errors = client.insert_rows(table, Row_list)  # Make an API request.
     if errors == []:
          print("New rows have been added.")

Attempted so far;

Check data was being pulled -> PASSED, I printed out row_list and data is there
Run locally from my machine -> PASSED, data appeared when I ran it from a python terminal
Print out the table details -> PASSED, see attached screenshot it all appears in the logs
Confirm it is able to find the table -> PASSED, I changed the name of the table to one that didn't exist and it failed

Not sure what is next, any advice would be greatly appreciated

Maybe this post in Google Cloud documentation could help.

https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#loading_csv_data_into_a_table

Upvotes: 3

Answers (2)

Shipra Sarkar

Reputation: 1485

You can directly stream the data from the website to BigQuery using Cloud Functions but the data should be clean and conform to BigQuery standards else the e insertion will fail. One more point to note is that the dataframe columns must match the table columns for the data to be successfully inserted. I tested this out and saw insertion errors returned by the client when the column names didn’t match.

Writing the function

I have created a simple Cloud Function using the documentation and pandas example. The dependencies that need to be included are google-cloud-bigquery and pandas.

main.py

from google.cloud import bigquery
import pandas as pd
 
def hello_gcs(event,context):
   df = pd.read_csv('https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv')
   df.set_axis(["Month", "Year_1", "Year_2", "Year_3"], axis=1, inplace=True) ## => Rename the columns if necessary
   table_id = "project.dataset.airtravel"
  
   ## Get BiqQuery Set up
   client = bigquery.Client()
   table = client.get_table(table_id)
   errors = client.insert_rows_from_dataframe(table, df)  # Make an API request.
   if errors == []:
        print("Data Loaded")
        return "Success"
   else:
        print(errors)
        return "Failed"

requirements.txt

# Function dependencies, for example:
# package>=version
google-cloud-bigquery
pandas

Now you can directly deploy the function.

Output

Output Table

Upvotes: 3

Daniel Ocando

Reputation: 3794

Assuming that the App Engine default service account has the default Editor role assigned and that you have a very simple schema for the BigQuery table. For example:

Field name      Type          Mode       Policy tags    Description
date            STRING        NULLABLE  
location        STRING        NULLABLE  
new_cases       INTEGER       NULLABLE  
new_deaths      INTEGER       NULLABLE  
total_cases     INTEGER       NULLABLE  
total_deaths    INTEGER       NULLABLE

The following modification of your code should work for an HTTP triggered function. Notice that you were not including the Row_list.append(my_list) in the for loop to populate your list with the elements and that according to the samples on the documentation you should be using a list of tuples:

from google.cloud import bigquery
import pandas as pd

client = bigquery.Client()
table_id = "[PROJECT-ID].[DATASET].[TABLE]"

def download_data(request):

     df = pd.read_csv('https://covid.ourworldindata.org/data/ecdc/full_data.csv')

     # Create an empty list 
     Row_list =[] 

     # Iterate over each row 
     for index, rows in df.iterrows(): 
          # Create list for the current row 
          my_list =(rows.date, rows.location, rows.new_cases, rows.new_deaths, rows.total_cases, rows.total_deaths)  
          # append the list to the final list 
          Row_list.append(my_list) 


     ## Get Biq Query Set up
     table = client.get_table(table_id)

     errors = client.insert_rows(table, Row_list)  # Make an API request.
     if errors == []:
          print("New rows have been added.")

With the very simple requirements.txt file:

# Function dependencies, for example:
# package>=version
pandas
google-cloud-bigquery

Upvotes: 0

GCP Cloud Function to write data to BigQuery runs with success but data doesn&#39;t appear in BigQuery table

Answers (2)

Related Questions

GCP Cloud Function to write data to BigQuery runs with success but data doesn't appear in BigQuery table