Sudha
Sudha

Reputation: 339

CorruptStatistics - Warning messages while using Parquet files

I get a ton of these messages when I execute a query in Hive on Hortonworks.

INFO: org.apache.parquet.CorruptStatistics: Ignoring statistics because this file was created prior to 1.8.0, see PARQUET-251

  1. How to fix this?
  2. If it's not fixed, what are the repercussions as I am getting the results correctly inspite of these warnings?

Upvotes: 1

Views: 1469

Answers (1)

Uwe L. Korn
Uwe L. Korn

Reputation: 8796

  1. You can fix this by re-writing the file using a Parquet producer, e.g. Hive, that is using a newer parquet-mr library. Then it will fill the file with the correct statistics.
  2. The results that you generate from this Parquet are correct. The warning only notifies you that it cannot use all optimizations in the computation (graph) while doing work on this file. There was a bug computing statistics in a previous parquet-mr version. This is now fixed but to have correct statistics (which are only used for query optimization), you need to re-write all files using a newer version. The data in the file itself is not affected by this bug.

Upvotes: 1

Related Questions