Jason L
Jason L

Reputation: 31

How to escape 'is a reserved keyword and cannot be used as field name' error in Spark SQL and Structured Streaming?

Currently when I used Structured Streaming v2.1.0 + Kafka v0.10 for real time logs processing, i got Exception in thread "main" java.lang.UnsupportedOperationException: `package` is a reserved keyword and cannot be used as field name

My task requried two logic parts:

part#1. translate log message which contains some json fromat string into corresponding case class via net.liftweb.json

one of my case classes as defined as below:

case class Mobile(val title: Option[String],
                  val desc: Option[String],
                  val adtype: Option[Int],
                  val apkname: Option[String],
                  @transient val `package`: Option[String],
                  val appstoreid: Option[String]
                 ) extends java.io.Serializable

part#2. use Structured streaming v2.1.0 +kafka v0.10 for real time processing:

    val spark: SparkSession = SparkSession.
      builder().
      appName("structured streaming test").
      getOrCreate()

    val df = spark.
      readStream.
      format("kafka").
      option("kafka.bootstrap.servers", "localhost:9092").
      option("subscribe", "stream1").
      option("maxOffsetPerTrigger", "10000").
      load()

    import spark.implicits._

    val ds = df.

      //change the value's type from binary to STRING
      selectExpr("CAST(value AS STRING)").
      as[String].
      map(myLogicToProcessLogs)

    val query = ds.
      writeStream.
      outputMode("append").
//      format("console").
      trigger(ProcessingTime("10 seconds")).
      foreach(new HttpSink).
      start()

    query.awaitTermination()

I got reason of the error is that my log messages include some java reserved keyword like 'package', which always failed by Spark SQL encoder

Note: By using `package` to escape scala key word check, and using @transient keyword to escape the java Serialization, i can successfully translate the case class above into RDD and do subsequent transform and action for batch processing without any error prompt. But How can I escape the key word checking from Spark SQL encoder and Structrued Streaming?

There is one related question: spark-submit fails when case class fields are reserved java keywords with backticks but i can to that, as the case class construtor parameter 'package' is still need by liftweb json for parsing.

I also found that there is other json tool like Gson provide JSON Field Naming Support to convert the standard Java field names to a Json field name,is there similar way on liftweb json? https://sites.google.com/site/gson/gson-user-guide#TOC-JSON-Field-Naming-Support

Upvotes: 1

Views: 1068

Answers (0)

Related Questions