Reputation: 3379
I am exploring ways to bring BigQuery data into Python, here is my code so far:
from google.cloud import bigquery
from pandas.io import gbq
client = bigquery.Client.from_service_account_json("path_to_my.json")
project_id = "my_project_name"
query_job = client.query("""
#standardSQL
SELECT date,
SUM(totals.visits) AS visits
FROM `projectname.dataset.ga_sessions_20*` AS t
WHERE parse_date('%y%m%d', _table_suffix) between
DATE_sub(current_date(), interval 3 day) and
DATE_sub(current_date(), interval 1 day)
GROUP BY date
""")
results = query_job.result() # Waits for job to complete.
#for row in results:
# print("{}: {}".format(row.date, row.visits))
results_df = gbq.read_gbq(query_job,project_id=project_id)
The commented out lines: #for row in results:
print("{}: {}".format(row.date, row.visits))
return the correct results from my query, but they aren't usable in this form, as a next step I'd like to get them into a dataframe, but this code returns the error TypeError: Object of type 'QueryJob' is not JSON serializable
.
Can anyone tell me what is wrong with my code to generate this error, or perhaps suggest a better way to bring in BigQuery data to a dataframe?
Upvotes: 1
Views: 1409
Reputation: 11797
The method read_gbq
expects a str
as input and not a QueryJob
one.
Try running it like this instead:
query = """
#standardSQL
SELECT date,
SUM(totals.visits) AS visits
FROM `projectname.dataset.ga_sessions_20*` AS t
WHERE parse_date('%y%m%d', _table_suffix) between
DATE_sub(current_date(), interval 3 day) and
DATE_sub(current_date(), interval 1 day)
GROUP BY date
"""
results_df = gbq.read_gbq(query, project_id=project_id, private_key='path_to_my.json')
Upvotes: 4