Reputation: 798
I'm trying to process a bunch of large json log files with spark, but it fails every time with scala.MatchError
, Whether I give it schema or not.
I just want to skip lines that does not match schema, but I can't find how in docs of spark.
I know write a json parser and map it to json file RDD can get things done, but I want to use sqlContext.read.schema(schema).json(fileNames).selectExpr(...)
because it's much easier to maintain.
Upvotes: 1
Views: 516
Reputation: 25929
This will be solved in Spark 1.6.1 https://issues.apache.org/jira/browse/SPARK-12057
For now you can compile a version of spark which includes the fix (essentially raising a parsing exception instead of a general exception on a MatchError and then reporting the record as corrupt - see code https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonParser.scala )
Upvotes: 0