Configure column names in BigQuery file load python

Question

I am attempting to upload a file from Google's BigQuery Python library (google-cloud-bigquery==1.3.0)

dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.CSV
job_config.skip_leading_rows = 1
job_config.autodetect = True

with open(filename, 'rb') as source_file:
    job = client.load_table_from_file(
        source_file,
        table_ref,
        location='US',
        job_config=job_config)

job.result()

This successfully creates the table and inserts the data, in this case, a simple one-column file of string type.

However, it sets the following column name: string_field_0.

Is there a way I can customize these column names?

Graham Polley · Accepted Answer

Because you have just one column of only Strings, BigQuery can't work out what the header/column names are even though you are using job_config.skip_leading_rows = 1 in your code i.e it doesn't know the difference between a row and a header. If you had a second column with say, an Integer, then BigQuery would be able to preserve the column names because it can now tell the difference.

As a workaround, either don't use auto_detect and manually specify the schema (it's just one column after all), or load it and then hit it with some SQL after and rename the column.

Configure column names in BigQuery file load python

Answers (2)

Related Questions