Reputation: 1060
I am attempting to upload a file from Google's BigQuery Python library (google-cloud-bigquery==1.3.0
)
Using documentation:
dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.CSV
job_config.skip_leading_rows = 1
job_config.autodetect = True
with open(filename, 'rb') as source_file:
job = client.load_table_from_file(
source_file,
table_ref,
location='US',
job_config=job_config)
job.result()
This successfully creates the table and inserts the data, in this case, a simple one-column file of string type.
However, it sets the following column name: string_field_0
.
Is there a way I can customize these column names?
Upvotes: 0
Views: 2104
Reputation: 1946
You should be able to add a schema for the load job. e.g.
job_config.schema = [SchemaField('columnName', 'STRING', mode='nullable')]
Upvotes: 1
Reputation: 14791
Because you have just one column of only Strings, BigQuery can't work out what the header/column names are even though you are using job_config.skip_leading_rows = 1
in your code i.e it doesn't know the difference between a row and a header. If you had a second column with say, an Integer, then BigQuery would be able to preserve the column names because it can now tell the difference.
As a workaround, either don't use auto_detect
and manually specify the schema (it's just one column after all), or load it and then hit it with some SQL after and rename the column.
Upvotes: 1