Live data from BigQuery into a Python DataFrame

Question

I am exploring ways to bring BigQuery data into Python, here is my code so far:

from google.cloud import bigquery
from pandas.io import gbq

client = bigquery.Client.from_service_account_json("path_to_my.json")

project_id = "my_project_name"

query_job = client.query("""
    #standardSQL
    SELECT date,
    SUM(totals.visits) AS visits
    FROM `projectname.dataset.ga_sessions_20*` AS t
    WHERE parse_date('%y%m%d', _table_suffix) between 
    DATE_sub(current_date(), interval 3 day) and
    DATE_sub(current_date(), interval 1 day)
    GROUP BY date
    """)

results = query_job.result()  # Waits for job to complete.

#for row in results:
#  print("{}: {}".format(row.date, row.visits))

results_df = gbq.read_gbq(query_job,project_id=project_id)

The commented out lines: #for row in results: print("{}: {}".format(row.date, row.visits)) return the correct results from my query, but they aren't usable in this form, as a next step I'd like to get them into a dataframe, but this code returns the error TypeError: Object of type 'QueryJob' is not JSON serializable.

Can anyone tell me what is wrong with my code to generate this error, or perhaps suggest a better way to bring in BigQuery data to a dataframe?

Willian Fuks · Accepted Answer

The method read_gbq expects a str as input and not a QueryJob one.

Try running it like this instead:

query = """
    #standardSQL
    SELECT date,
    SUM(totals.visits) AS visits
    FROM `projectname.dataset.ga_sessions_20*` AS t
    WHERE parse_date('%y%m%d', _table_suffix) between 
    DATE_sub(current_date(), interval 3 day) and
    DATE_sub(current_date(), interval 1 day)
    GROUP BY date
"""

results_df = gbq.read_gbq(query, project_id=project_id, private_key='path_to_my.json')

Live data from BigQuery into a Python DataFrame

Answers (1)

Related Questions