Tail of Godzilla
Tail of Godzilla

Reputation: 551

Is it possible to merge two parquet directory on hdfs?

I have two parquet directory on my HDFS with the same schema. I want to merge these two directories into one parquet directory, to be able to create an external hive table from it.

I have googled my problem, but almost all result is about merging small parquet files into larger parquet files.

Upvotes: 0

Views: 318

Answers (1)

Zoltan
Zoltan

Reputation: 3105

As long as the parquet files have the same schema, you can simply put them in the same directory. Hive will process all files that it finds in an external table's directory (except a few special files with specific names), so you can simply put your data there and Hive will find it. (In older Hive versions this was true for non-external tables as well. In newer Hive versions, however, it is only true for external tables thus you should not tamper with the contents of so-called managed tables.)

Upvotes: 2

Related Questions