Reputation: 1232
I'm trying to upload this data into from my bucket into bigquery, but it's complaining. My file is an excel file.
ID A B C D E F Value1 Value2
333344 ALG A RETAIL OTHER YIPP Jun 2019 123 4
34563 ALG A NON-RETAIL OTHER TATS Mar 2019 124 0
7777777777 - E RETAIL NASAL KHPO Jul 2020 1,448 0
7777777777 - E RETAIL SEVERE ASTHMA PZIFER Oct 2019 1,493 162
From python I call the file as follow:
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
table_id = "project.dataset.my_table"
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField('ID','STRING'),
bigquery.SchemaField('A','STRING'),
bigquery.SchemaField('B','STRING'),
bigquery.SchemaField('C','STRING'),
bigquery.SchemaField('D','STRING'),
bigquery.SchemaField('E','STRING'),
bigquery.SchemaField('F','STRING'),
bigquery.SchemaField('Value1','STRING'),
bigquery.SchemaField('Value2','STRING'),
],
skip_leading_rows=1
)
uri = "gs://bucket/folder/file.xlsx"
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
) # Make an API request.
load_job.result() # Wait for the job to complete.
table = client.get_table(table_id)
print("Loaded {} rows to table {}".format(table.num_rows, table_id))
I am getting the following error, and its complaining about a line that it's not even there.
BadRequest: 400 Error while reading data, error message: CSV table references column position 8, but line starting at position:660 contains only 1 columns.
I thought the problem was the data type, as I had selected ID as integer and value1 and value2 as Integer too and F as timestamp, so now I'm trying everything as String, and I still get the error.
My file is only 4 lines in this test I'm doing
Upvotes: 1
Views: 1949
Reputation: 1978
Excel files are not supported by BigQuery.
A few workaround solutions:
bq load
command will do, cf here),Upvotes: 2
Reputation: 439
Your input file seems like TSV.
So you need to set field delimiter to '\t'
like this,
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField('ID','STRING'),
bigquery.SchemaField('A','STRING'),
bigquery.SchemaField('B','STRING'),
bigquery.SchemaField('C','STRING'),
bigquery.SchemaField('D','STRING'),
bigquery.SchemaField('E','STRING'),
bigquery.SchemaField('F','STRING'),
bigquery.SchemaField('Value1','STRING'),
bigquery.SchemaField('Value2','STRING'),
],
skip_leading_rows=1,
field_delimiter='\t'
)
Upvotes: 0