Reputation: 8810
When trying to upload a parquet file into BigQuery, I get this error:
Error while reading data, error message: Read less values than expected from: prod-scotty-45ecd3eb-e041-450c-bac8-3360a39b6c36; Actual: 0, Expected: 10
I don't know why I get the error.
I tried inspecting the file with parquet-tools and it prints the file contents without issues.
The parquet file is written using the parquetjs JavaScript library.
Update: I also filed this in the BigQuery issue tracker here: https://issuetracker.google.com/issues/145797606
Upvotes: 2
Views: 1130
Reputation: 1907
From the error message it seems like a rogue line break might be causing this.
We use DataPrep to clean up our data, it works quite well. If I am wrong it's also google recommended method of cleaning up / sanitising data for big query.
https://cloud.google.com/dataprep/docs/html/BigQuery-Data-Type-Conversions_102563896
Upvotes: 1
Reputation: 8810
It turns out BigQuery doesn't support the latest version of the parquet format. I changed the output not to use the version 2 format and BigQuery accepted it.
Upvotes: 2