Reputation: 462
I'm trying to load a file from GCS to BigQuery whose schema is auto-generated from the file in GCS. I'm using Apache Airflow to do the same, the problem I'm having is that when I use auto-detect schema from file, BigQuery creates schema based on some ~100 initial values.
For example, in my case there is a column say X
, the values in X
is mostly of Integer
type, but there are some values which are of String
type, so bq load
will fail with schema mismatch, in such a scenario we need to change the data type to STRING
.
So what I could do is manually create a new table by generating schema on my own. Or I could set the max_bad_record
value to some 50
, but that doesn't seem like a good solution. An ideal solution would be like this:
BQ
without any data mismatch, then I don't need to do anything.Upvotes: 0
Views: 726
Reputation: 3642
As you can not change column type in bq (see this link)
BigQuery natively supports the following schema modifications:
BigQuery natively supports the following schema modifications:
* Adding columns to a schema definition
* Relaxing a column's mode from REQUIRED to NULLABLEAll other schema modifications are unsupported and require manual workarounds
So as a workaround I suggest:
--max_rows_per_request = 1
in your scriptThis will create the table with the correct schema and 1 line and from there you can load the rest of the data.
Upvotes: 1