Reputation: 21
I'm trying to do something I feel like is pretty simple but can't seem to get it to work. I created a table in Bigquery and am now trying to use the bq tool to import data in to it. The file I'm trying to import in to the table is a CSV file that uses quotes on columns that have data that could have commas in them.
Table:
-ColA string -ColB string -ColC string -ColD integer
File:
"ColA data","ColB data","ColC data", ColD data
Command
bq load --quote "" --format=csv dataset.TableName c:\path\filename.csv ColA:string,ColB:string,ColC:string,ColD:integer
The error I'm getting is: "-Error while reading data, error message: Too many values in row starting at position: 164"
The curious thing is that position 164 happens to be the end of line 1 in the file I'm trying to import. If I delete all of the data it loads that first row, but if I put the other rows back in then it fails out again. Is there a option to set that the file is row terminator with CR/LF? I would think that is just the default because its a standard CSV but just a thought.
Upvotes: 1
Views: 6619
Reputation: 2099
A quick test using your information worked:
$ cat file.csv
"ColA data","ColB data","ColC data",22
$ bq load --quote "" --format=csv test_dataset.table_import_csv file.csv
Upload complete.
Waiting on bqjob_1 ... (0s) Current status: DONE
$ bq query --use_legacy_sql=false 'select * from test_dataset.table_import_csv'
Waiting on bqjob_2 ... (0s) Current status: DONE
+-------------+-------------+-------------+------+
| cola | colb | colc | cold |
+-------------+-------------+-------------+------+
| "ColA data" | "ColB data" | "ColC data" | 22 |
+-------------+-------------+-------------+------+
$
Then I add a new comma in the first row and got the same error as you: "Too many values in row starting at position: 40"
To solve your issue, you need to guaranty that string columns don't include any comma character, because this is the character used to identify new columns in a CSV file.
The option to change the column delimiter is --field_delimiter, try it out:
The character that indicates the boundary between columns in the data. Both \t and tab are allowed for tab delimiters.
Upvotes: 1