Tony Ng
Tony Ng

Reputation: 164

Is there an efficient way to merge parquet files?

Context:

I understand there has been a question that was asked approximately 4 years ago about this:

Effectively merge big parquet files

Question:

However, I was wondering if there are any good solutions out there to merge large and numerous amount of parquet files into 1 file beside provisioning a large Spark job to read then write?

Thanks!

Upvotes: 3

Views: 4194

Answers (1)

Ged
Ged

Reputation: 18108

Leaving delta api aside, there is no such changed, newer approach. The Spark approach read in and write out still applies.

One must be careful, as the small files problem is an issue for csv and loading, but once data is at rest, file skipping, block skipping and such is more aided by having more than just a few files.

Upvotes: 1

Related Questions