Reputation: 175
We are working with apache spark, we save json files as gzip-compressed parquet files in hdfs. However, when reading them back to generate a dataframe, some files (but not all) give rise to the following exception:
ERROR Executor: Exception in task 2.0 in stage 72.0 (TID 88)
org.apache.parquet.io.ParquetDecodingException: Can not read value at 351 in
block 0 in file file:/path/to/file [...]
Caused by: java.lang.ClassCastException:
org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to
org.apache.spark.sql.catalyst.expressions.MutableDouble
Any help is much appreciated!
Upvotes: 3
Views: 6518
Reputation: 56
This kind of error will occur when you try to simultaneously read parquet file which has different schema. Try to have /convert all your source file have the same schema or by converting all of them at the same time.
Upvotes: 4