Greg
Greg

Reputation: 21

Loading a CSV into Bigquery using bq

I'm trying to do something I feel like is pretty simple but can't seem to get it to work. I created a table in Bigquery and am now trying to use the bq tool to import data in to it. The file I'm trying to import in to the table is a CSV file that uses quotes on columns that have data that could have commas in them.

Table:

-ColA string -ColB string -ColC string -ColD integer

File:

"ColA data","ColB data","ColC data", ColD data

Command

bq load --quote "" --format=csv dataset.TableName c:\path\filename.csv ColA:string,ColB:string,ColC:string,ColD:integer

The error I'm getting is: "-Error while reading data, error message: Too many values in row starting at position: 164"

The curious thing is that position 164 happens to be the end of line 1 in the file I'm trying to import. If I delete all of the data it loads that first row, but if I put the other rows back in then it fails out again. Is there a option to set that the file is row terminator with CR/LF? I would think that is just the default because its a standard CSV but just a thought.

Upvotes: 1

Views: 6619

Answers (1)

rsantiago
rsantiago

Reputation: 2099

A quick test using your information worked:

$ cat file.csv
"ColA data","ColB data","ColC data",22

$ bq load --quote "" --format=csv test_dataset.table_import_csv file.csv
Upload complete.
Waiting on bqjob_1 ... (0s) Current status: DONE   

$ bq query --use_legacy_sql=false 'select * from test_dataset.table_import_csv'
Waiting on bqjob_2 ... (0s) Current status: DONE

+-------------+-------------+-------------+------+
|    cola     |    colb     |    colc     | cold |
+-------------+-------------+-------------+------+
| "ColA data" | "ColB data" | "ColC data" |   22 |
+-------------+-------------+-------------+------+

$

Then I add a new comma in the first row and got the same error as you: "Too many values in row starting at position: 40"

To solve your issue, you need to guaranty that string columns don't include any comma character, because this is the character used to identify new columns in a CSV file.

The option to change the column delimiter is --field_delimiter, try it out:

The character that indicates the boundary between columns in the data. Both \t and tab are allowed for tab delimiters.

Upvotes: 1

Related Questions