Spark misses some rows when reading parquet files generated by presto

Question

Recently, I found that when spark SQL reads parquet files generated by presto, not all rows can be read out. For example, for a presto table with parquet format which contains 1000 rows, spark SQL only get 400 rows.

I also have done some investigation, and found that for a parquet file with 2 row groups, spark SQL only get 1 row group data, and for another file with 1 row group, it got empty result.

I am just wondering if someone has encountered the same issue before and got the reason.

My spark version is 3.4.1 and presto version is 419.

Spark misses some rows when reading parquet files generated by presto

Answers (0)

Related Questions