Dobes Vandermeer
Dobes Vandermeer

Reputation: 8810

How can I figure out why BigQuery is rejecting my parquet file?

When trying to upload a parquet file into BigQuery, I get this error:

Error while reading data, error message: Read less values than expected from: prod-scotty-45ecd3eb-e041-450c-bac8-3360a39b6c36; Actual: 0, Expected: 10 

I don't know why I get the error.

I tried inspecting the file with parquet-tools and it prints the file contents without issues.

The parquet file is written using the parquetjs JavaScript library.

Update: I also filed this in the BigQuery issue tracker here: https://issuetracker.google.com/issues/145797606

Upvotes: 2

Views: 1130

Answers (2)

Parth Mehta
Parth Mehta

Reputation: 1907

From the error message it seems like a rogue line break might be causing this.

We use DataPrep to clean up our data, it works quite well. If I am wrong it's also google recommended method of cleaning up / sanitising data for big query.

https://cloud.google.com/dataprep/docs/html/BigQuery-Data-Type-Conversions_102563896

Upvotes: 1

Dobes Vandermeer
Dobes Vandermeer

Reputation: 8810

It turns out BigQuery doesn't support the latest version of the parquet format. I changed the output not to use the version 2 format and BigQuery accepted it.

Upvotes: 2

Related Questions