Iterator516
Iterator516

Reputation: 287

Convert string columns into numeric columns with SQL in Google BigQuery

I'm analyzing the data of New York City taxi trips of yellow cars in 2018. (You need a Google BigQuery account to access this data set.)

The schema says that most of the columns are numeric. However, when I tried to calculate the sum of the key dollar figures (tip_amount, tolls_amount, total_amount), I got an error message saying that they are string variables.

SELECT sum(total_amount) 
FROM [bigquery-public-data:new_york_taxi_trips.tlc_yellow_trips_2018] 
WHERE month(dropoff_datetime) = 12

Error: Field total_amount is of type STRING which is not supported for SUM

I then tried to use the cast() function to convert it to a numeric variable, but that did not work.

SELECT sum(total_amount_numeric) FROM 
(
     SELECT cast(total_amount as numeric) as total_amount_numeric 
     FROM [bigquery-public-data:new_york_taxi_trips.tlc_yellow_trips_2018]
     WHERE month(dropoff_datetime) = 12
)

Error: Field total_amount_numeric is of type STRING which is not supported for SUM

How can I analyze these numeric variables as I intended, instead of the string variables as they are erroneously set in the database?

Upvotes: 2

Views: 12308

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172974

Below is for BigQuery Standard SQL

#standardSQL
SELECT SUM(total_amount) 
FROM `bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2018`
WHERE EXTRACT(MONTH FROM dropoff_datetime) = 12  

The problem you had is because NUMERIC data type is not supported by BigQuery Legacy SQL and rather is treated as STRING and cannot CAST to neither FLOAT nor INTEGER

So, the workaround is to use BigQuery Standard SQL as in above example - and as you see here you don't need to do any CAST'ing as this field is already NUMERIC

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269563

Your query will run as follows in Standard SQL:

SELECT sum(total_amount_numeric)  
FROM (SELECT cast(total_amount as numeric) as total_amount_numeric 
      FROM `bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2018`
      WHERE EXTRACT(month FROM dropoff_datetime) = 12
     ) x;

You can include this hint before the query to ensure that it is run using standard SQL:

#standardSQL

Upvotes: 1

Related Questions