Tsu Kernik
Tsu Kernik

Reputation: 175

spark error reading parquet

We are working with apache spark, we save json files as gzip-compressed parquet files in hdfs. However, when reading them back to generate a dataframe, some files (but not all) give rise to the following exception:

ERROR Executor: Exception in task 2.0 in stage 72.0 (TID 88)
org.apache.parquet.io.ParquetDecodingException: Can not read value at 351 in 
block 0 in file file:/path/to/file [...]
Caused by: java.lang.ClassCastException: 
org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to 
org.apache.spark.sql.catalyst.expressions.MutableDouble

Any help is much appreciated!

Upvotes: 3

Views: 6518

Answers (1)

Thirubalan
Thirubalan

Reputation: 56

This kind of error will occur when you try to simultaneously read parquet file which has different schema. Try to have /convert all your source file have the same schema or by converting all of them at the same time.

Upvotes: 4

Related Questions