ahalbert
ahalbert

Reputation: 474

Google Bigquery SELECT count(*) on API returns 0 when Same Count(*) queryin the GBQ Console returns a count

I'm examining the NOAA severe storms dataset in Google Bigquery and I wanted to iterate over the tables to get the storms per year between 1960 and 2022. with property damage.

from google.cloud import bigquery
client = bigquery.Client.from_service_account_json("secret.json")

for year in range(1960, 2023):
    query = f"""
    SELECT count(*) AS ct
    FROM `bigquery-public-data.noaa_historic_severe_storms.storms_{year}`
    WHERE damage_property > 0
    """
    query_job = client.query(query).result()
    for row in query_job:
       print(row)

The print statement prints results like:

Row((552,), {'ct': 0})
Row((620,), {'ct': 0})
Row((401,), {'ct': 0})

It returns 0 for the count, but excecuting the same query in the GBQ console gets me the correct result. If I use a different query:

    query = f"""
    SELECT *
    FROM `bigquery-public-data.noaa_historic_severe_storms.storms_{year}`
    WHERE damage_property > 0
    """

with the API, it works fine and gives me the correct results, the same results I get running it in the GBQ console. Why does count(*) work unexpectedly? How Do I get the correct results?

Upvotes: 0

Views: 680

Answers (2)

Jeff Oberlander
Jeff Oberlander

Reputation: 78

BigQuery first returns a query object and from there you can get the result using .result(). But that is an Iterator object so you have to iterate to get the values. In a count (*) you know that there is only one row to iterate on so you can use "next". And that returns a row with one value in it and thus the [0].

   iterator = client.query(query).result()
   first_row = next(iterator)
   print(first_row[0])

Upvotes: 1

Sakshi Gatyan
Sakshi Gatyan

Reputation: 2116

To access your query result you'll have to use either row[0] or row.ct in the code:

query_job = client.query(query).result()
for row in query_job:
   print(row[0])

Upvotes: 1

Related Questions