Russell Bie
Russell Bie

Reputation: 361

Spark misses some rows when reading parquet files generated by presto

Recently, I found that when spark SQL reads parquet files generated by presto, not all rows can be read out. For example, for a presto table with parquet format which contains 1000 rows, spark SQL only get 400 rows.

I also have done some investigation, and found that for a parquet file with 2 row groups, spark SQL only get 1 row group data, and for another file with 1 row group, it got empty result.

I am just wondering if someone has encountered the same issue before and got the reason.

My spark version is 3.4.1 and presto version is 419.

Upvotes: 0

Views: 31

Answers (0)

Related Questions