chinayangyongyong
chinayangyongyong

Reputation: 37

Scala parses non-canonical JSON

The following data can be seen with different value types. How can I get the desired output?

package ceshi

import scala.util.parsing.json.JSON

object ceshi1212 {
  def main(args: Array[String]): Unit = {


    class CC[T] {
      def unapply(a: Any): Option[T] = Some(a.asInstanceOf[T])
    }

    object M extends CC[Map[String, Any]]
    object L extends CC[List[Any]]
    object S extends CC[String]

    val jsonString =
      """
      {
        "languages": [
        {
            "name": "English",
            "is_active": "true",
            "completeness": "asdf"
        },
        {
            "name": "Latin",
            "is_active": "asdf",
            "completeness": "232"
        }
            ,{
                "name": "Latin",
                "is_active": "0009"
            }
            ,
            "error"
                  ]
      }
    """.stripMargin
    // 不规则json error和并列的数组类型不同 怎么解析自动跳过?
    val result = for {
      Some(M(map)) <- List(JSON.parseFull(jsonString))
      L(languages) = map("languages")
      M(language) <- languages
      S(name) = language("name")
      S(active) = language("is_active")
      S(completeness) = language.getOrElse("completeness","--")
    } yield {
      (name, active, completeness)
    }

    println(result)
    //i want result is:  List((English,true,asdf), (Latin,asdf,232),(Latain,0009,""))

  }
}

i want get result is List((English,true,asdf), (Latin,asdf,232),(Latain,0009,"")) note: 1 The string is not always at the end of the array, and the position is indeterminate 2 The three keys I need may not be complete

Upvotes: 0

Views: 133

Answers (2)

tentacle
tentacle

Reputation: 553

If you can switch parser library to circe, you can deal with this types of bad data.

Given you have data model

import io.circe.generic.semiauto._
import io.circe.parser.decode
import io.circe.{Decoder, Json}

case class Languages(languages: Seq[Language])
case class Language(name: String, is_active: String, completeness: Option[String])

You can define a fault-tolerant seq decoder that would skip bad data rather than crash whole parse

def tolerantSeqDecoder[A: Decoder]: Decoder[Seq[A]] = Decoder.decodeSeq(Decoder[A]
  .either(Decoder[Json])).map(_.flatMap(_.left.toOption))

and the rest...

val jsonString = """
  {
    "languages": [
    {
        "name": "English",
        "is_active": "true",
        "completeness": "asdf"
    },
    {
        "name": "Latin",
        "is_active": "asdf",
        "completeness": "232"
    },
    {
      "name": "Latin",
      "is_active": "0009"
    },
    "error"
  ]
  }
"""

val languageDecoder = deriveDecoder[Language]
implicit val tolerantDecoder =  tolerantSeqDecoder[Language](languageDecoder)
implicit val languagesDecoder = deriveDecoder[Languages]

val parsed = decode[Languages](jsonString)
println(parsed)

out:

Right(Languages(List(Language(English,true,Some(asdf)), Language(Latin,asdf,Some(232)), Language(Latin,0009,None))))

This approach was suggested by one of circe developers: How do I ignore decoding failures in a JSON array?

Upvotes: 0

d-xa
d-xa

Reputation: 524

As said in the comments there are other libraries to be recommended for working with json have a look at this post to get an overview: What JSON library to use in Scala?

Answer to your question with specific framework (play-json)

Personally I can recommend to use the play json framework. To obtain the result you have described with play json, your code might look like this:

import play.api.libs.json._

val json: JsValue = Json.parse(jsonString)
val list = (json \ "languages").as[Seq[JsValue]]

val names = list.map(x => ((x\"name").validate[String] match {
  case JsSuccess(v, p ) => v
  case _ => ""
  }
))

val isActives = list.map(x => ((x\"is_active").validate[String] match {
  case JsSuccess(v, p ) => v
  case _ => ""
  }
))

val completeness = list.map(x => ((x\"completeness").validate[String] match {
  case JsSuccess(v, p ) => v
  case _ => ""
  }
))

// need to know in advance what is your max length of your tuple (tmax)
//  since 3rd value "completeness" can be missing, so we just take "" instead
val tmax = 3 
val res = for(idx <-0 to tmax-1) yield (names(idx),isActives(idx),completeness(idx))
res.toList
// List[(String, String, String)] = List((English,true,asdf), (Latin,asdf,232), (Latin,0009,""))

There's also a very good documentation for the play json framework, just check it out yourself: https://www.playframework.com/documentation/2.8.x/ScalaJson

Upvotes: 1

Related Questions