Reputation: 361
Recently, I found that when spark SQL reads parquet files generated by presto, not all rows can be read out. For example, for a presto table with parquet format which contains 1000 rows, spark SQL only get 400 rows.
I also have done some investigation, and found that for a parquet file with 2 row groups, spark SQL only get 1 row group data, and for another file with 1 row group, it got empty result.
I am just wondering if someone has encountered the same issue before and got the reason.
My spark version is 3.4.1 and presto version is 419.
Upvotes: 0
Views: 31