adrpino
adrpino

Reputation: 1060

Configure column names in BigQuery file load python

I am attempting to upload a file from Google's BigQuery Python library (google-cloud-bigquery==1.3.0)

Using documentation:

dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.CSV
job_config.skip_leading_rows = 1
job_config.autodetect = True

with open(filename, 'rb') as source_file:
    job = client.load_table_from_file(
        source_file,
        table_ref,
        location='US',
        job_config=job_config)

job.result()

This successfully creates the table and inserts the data, in this case, a simple one-column file of string type.

However, it sets the following column name: string_field_0.

Is there a way I can customize these column names?

Upvotes: 0

Views: 2104

Answers (2)

Bobbylank
Bobbylank

Reputation: 1946

You should be able to add a schema for the load job. e.g.

job_config.schema = [SchemaField('columnName', 'STRING', mode='nullable')]

Upvotes: 1

Graham Polley
Graham Polley

Reputation: 14791

Because you have just one column of only Strings, BigQuery can't work out what the header/column names are even though you are using job_config.skip_leading_rows = 1 in your code i.e it doesn't know the difference between a row and a header. If you had a second column with say, an Integer, then BigQuery would be able to preserve the column names because it can now tell the difference.

As a workaround, either don't use auto_detect and manually specify the schema (it's just one column after all), or load it and then hit it with some SQL after and rename the column.

Upvotes: 1

Related Questions