Reputation: 474
I'm examining the NOAA severe storms dataset in Google Bigquery and I wanted to iterate over the tables to get the storms per year between 1960 and 2022. with property damage.
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json("secret.json")
for year in range(1960, 2023):
query = f"""
SELECT count(*) AS ct
FROM `bigquery-public-data.noaa_historic_severe_storms.storms_{year}`
WHERE damage_property > 0
"""
query_job = client.query(query).result()
for row in query_job:
print(row)
The print statement prints results like:
Row((552,), {'ct': 0})
Row((620,), {'ct': 0})
Row((401,), {'ct': 0})
It returns 0 for the count, but excecuting the same query in the GBQ console gets me the correct result. If I use a different query:
query = f"""
SELECT *
FROM `bigquery-public-data.noaa_historic_severe_storms.storms_{year}`
WHERE damage_property > 0
"""
with the API, it works fine and gives me the correct results, the same results I get running it in the GBQ console. Why does count(*) work unexpectedly? How Do I get the correct results?
Upvotes: 0
Views: 680
Reputation: 78
BigQuery first returns a query object and from there you can get the result using .result(). But that is an Iterator object so you have to iterate to get the values. In a count (*) you know that there is only one row to iterate on so you can use "next". And that returns a row with one value in it and thus the [0].
iterator = client.query(query).result()
first_row = next(iterator)
print(first_row[0])
Upvotes: 1
Reputation: 2116
To access your query result you'll have to use either row[0]
or row.ct
in the code:
query_job = client.query(query).result()
for row in query_job:
print(row[0])
Upvotes: 1