Reputation: 73
I'm working with BigQuery and experementing with using it to query a CSV file in a bucket in Google Cloud Storage. I came across some strange behavior where -- only when there are 3 rows in the CSV, and the first row contains a string in any field -- when querying the table, the first row is missing.
myfile.csv
:testin,2,2
testing3,3,4
gsutil cp myfile.csv gs://bucket/
gs://bucket/myfile.csv
:bq mk --external_table_definition=Field1:STRING,Field2:STRING,Field3:INTEGER@CSV=gs://bucket/myfile.csv dataset.table
bq query "SELECT * FROM dataset.table;"
Waiting on biquery_job_id_1234567 ... (0s) Current
status: DONE
+----------+--------+--------+
| Field1 | Field2 | Field3 |
+----------+--------+--------+
| testin | 2 | 2 |
| testing3 | 3 | 4 |
+----------+--------+--------+
myfile.csv
to look like the following:1,h,3
testin,2,2
testing3,3,4
gs://bucket/myfile.csv
:gsutil cp myfile.csv gs://bucket/
dataset.table
again:bq query "SELECT * FROM dataset.table;"
Waiting on bigquery_job_78901234 ... (0s) Current status: DONE
+----------+--------+--------+
| Field1 | Field2 | Field3 |
+----------+--------+--------+
| testin | 2 | 2 |
| testing3 | 3 | 4 |
+----------+--------+--------+
gsutil cat gs://bucket/myfile.csv
1,h,3
testin,2,2
testing3,3,4
myfile.csv
looks as follows:1,2,3
testin,2,2
testing3,3,4
gsutil cp myfile.csv gs://bucket/
bq query "SELECT * FROM dataset.table;"
Waiting on bigquery_job_4567890 ... (0s) Current status: DONE
+----------+--------+--------+
| Field1 | Field2 | Field3 |
+----------+--------+--------+
| 1 | 2 | 3 |
| testin | 2 | 2 |
| testing3 | 3 | 4 |
+----------+--------+--------+
Does anyone have any insight as to what scenarios may cause the first row to become missing if it contains a string within the first 2 fields?
Thanks,
Upvotes: 3
Views: 1015
Reputation: 14786
There is a parameter called csvOptions.skipLeadingRows which is used to specify the number of "header rows" in a CSV file.
If skipLeadingRows
is unspecified, BigQuery tries to autodetect the number of header rows. Setting skipLeadingRows
manually to 0 should disable this behavior.
Upvotes: 1