Reputation: 1
I have a Spark setup where partitions with original Parquet files exist, and queries are actively running on these partitions. I'm running a background job to optimize these Parquet files for better compression, which involves changing the Parquet object layout. How can I ensure that the Parquet file overwrites are atomic and do not fail or cause data integrity issues in Spark queries? What are the possible solutions?
We cannot use data lake house because of legacy challenges.
Upvotes: 0
Views: 97
Reputation: 1757
This is an open question without more details about the use case, but I can give you a few thoughts:
Upvotes: 0