Reputation: 164
Context:
I understand there has been a question that was asked approximately 4 years ago about this:
Effectively merge big parquet files
Question:
However, I was wondering if there are any good solutions out there to merge large and numerous amount of parquet files into 1 file beside provisioning a large Spark job to read then write?
Thanks!
Upvotes: 3
Views: 4194
Reputation: 18108
Leaving delta api aside, there is no such changed, newer approach. The Spark approach read in and write out still applies.
One must be careful, as the small files problem is an issue for csv and loading, but once data is at rest, file skipping, block skipping and such is more aided by having more than just a few files.
Upvotes: 1