Reputation: 3
I'm trying to read the following rows out of a CSV file stored in GCS
headers: "A","B","C","D"
row1:"4000,0000000000000","15400000,000","12311918,400000","3088081,600" row2:"5000,0000000000000","19250000,000","15389898,000000","3860102,000"
The issue here is how BigQuery is actually interpreting and thus outputting these numbers:
It's interpreting A as FLOAT64, and B, C and D as INT64, which is okay since I decided to use autodetect schema. But when I try to convert it to a different type it's still outputting the numbers unproperly.
This is the query:
SELECT
CAST(quantity AS INT64) AS A,
CAST(expenses_2 AS FLOAT64) AS B,
CAST(cexpenses_3AS FLOAT64) AS C,
CAST(expenses_4 AS FLOAT64) AS D
FROM
`wide-gecko-289100.bqtest.expenses`
These are the results of query above:
Either way, it's misinterpreting how to read the numbers, it should be as follows:
row1: [4000] [15400000] [12311918,4] [3088081,6]
row2: [5000] [19250000] [15389898] [3860102]
Is there a way to solve this?
Upvotes: 0
Views: 1836
Reputation: 4384
This is due to BigQuery not understanding the localized format you're using for the numeric values. It expects the period (.) character for the decimal separator.
If you can't deal with this early in the process that produces the CSV files in BigQuery, another strategy is to instead use a string type for the columns, and then do some manipulation.
Here's a simple conversion example that shows some string manipulation and casting to get to the desired type. If you're using both commas and periods as part of the localized format, you'll need a more complex string manipulation.
WITH
sample_row AS (
SELECT "4000,0000000000000" as A, "15400000,000" as B,"12311918,400000" as C,"3088081,600" as D
)
SELECT
A,
CAST(REPLACE(A,",",".") AS FLOAT64) as A_as_float64,
CAST(CAST(REPLACE(A,",",".") AS FLOAT64) AS INT64) as A_as_int64
FROM
sample_row
You could also generalize this as a user defined function (temporary or persisted) to make it easier to reuse:
CREATE TEMPORARY FUNCTION parseAsFloat(instr STRING) AS (CAST(REPLACE(instr,",",".") AS FLOAT64));
WITH
sample_row AS (
SELECT "4000,0000000000000" as A, "15400000,000" as B,"12311918,400000" as C,"3088081,600" as D
)
SELECT
CAST(parseAsFloat(A) AS INT64) as A,
parseAsFloat(B) as B,
parseAsFloat(C) as C,
parseAsFloat(D) as D,
FROM
sample_row
Upvotes: 1
Reputation: 3032
I think this is an issue with how BigQuery interprets a comma. It seems to detect it as a thousands separator rather than a decimal.
https://issuetracker.google.com/issues/129992574
Is it possible to replace with a "." instead?
Upvotes: 1