Reputation: 37
The following data can be seen with different value types. How can I get the desired output?
package ceshi
import scala.util.parsing.json.JSON
object ceshi1212 {
def main(args: Array[String]): Unit = {
class CC[T] {
def unapply(a: Any): Option[T] = Some(a.asInstanceOf[T])
}
object M extends CC[Map[String, Any]]
object L extends CC[List[Any]]
object S extends CC[String]
val jsonString =
"""
{
"languages": [
{
"name": "English",
"is_active": "true",
"completeness": "asdf"
},
{
"name": "Latin",
"is_active": "asdf",
"completeness": "232"
}
,{
"name": "Latin",
"is_active": "0009"
}
,
"error"
]
}
""".stripMargin
// 不规则json error和并列的数组类型不同 怎么解析自动跳过?
val result = for {
Some(M(map)) <- List(JSON.parseFull(jsonString))
L(languages) = map("languages")
M(language) <- languages
S(name) = language("name")
S(active) = language("is_active")
S(completeness) = language.getOrElse("completeness","--")
} yield {
(name, active, completeness)
}
println(result)
//i want result is: List((English,true,asdf), (Latin,asdf,232),(Latain,0009,""))
}
}
i want get result is List((English,true,asdf), (Latin,asdf,232),(Latain,0009,"")) note: 1 The string is not always at the end of the array, and the position is indeterminate 2 The three keys I need may not be complete
Upvotes: 0
Views: 133
Reputation: 553
If you can switch parser library to circe, you can deal with this types of bad data.
Given you have data model
import io.circe.generic.semiauto._
import io.circe.parser.decode
import io.circe.{Decoder, Json}
case class Languages(languages: Seq[Language])
case class Language(name: String, is_active: String, completeness: Option[String])
You can define a fault-tolerant seq decoder that would skip bad data rather than crash whole parse
def tolerantSeqDecoder[A: Decoder]: Decoder[Seq[A]] = Decoder.decodeSeq(Decoder[A]
.either(Decoder[Json])).map(_.flatMap(_.left.toOption))
and the rest...
val jsonString = """
{
"languages": [
{
"name": "English",
"is_active": "true",
"completeness": "asdf"
},
{
"name": "Latin",
"is_active": "asdf",
"completeness": "232"
},
{
"name": "Latin",
"is_active": "0009"
},
"error"
]
}
"""
val languageDecoder = deriveDecoder[Language]
implicit val tolerantDecoder = tolerantSeqDecoder[Language](languageDecoder)
implicit val languagesDecoder = deriveDecoder[Languages]
val parsed = decode[Languages](jsonString)
println(parsed)
out:
Right(Languages(List(Language(English,true,Some(asdf)), Language(Latin,asdf,Some(232)), Language(Latin,0009,None))))
This approach was suggested by one of circe developers: How do I ignore decoding failures in a JSON array?
Upvotes: 0
Reputation: 524
As said in the comments there are other libraries to be recommended for working with json have a look at this post to get an overview: What JSON library to use in Scala?
Answer to your question with specific framework (play-json)
Personally I can recommend to use the play json framework. To obtain the result you have described with play json, your code might look like this:
import play.api.libs.json._
val json: JsValue = Json.parse(jsonString)
val list = (json \ "languages").as[Seq[JsValue]]
val names = list.map(x => ((x\"name").validate[String] match {
case JsSuccess(v, p ) => v
case _ => ""
}
))
val isActives = list.map(x => ((x\"is_active").validate[String] match {
case JsSuccess(v, p ) => v
case _ => ""
}
))
val completeness = list.map(x => ((x\"completeness").validate[String] match {
case JsSuccess(v, p ) => v
case _ => ""
}
))
// need to know in advance what is your max length of your tuple (tmax)
// since 3rd value "completeness" can be missing, so we just take "" instead
val tmax = 3
val res = for(idx <-0 to tmax-1) yield (names(idx),isActives(idx),completeness(idx))
res.toList
// List[(String, String, String)] = List((English,true,asdf), (Latin,asdf,232), (Latin,0009,""))
There's also a very good documentation for the play json framework, just check it out yourself: https://www.playframework.com/documentation/2.8.x/ScalaJson
Upvotes: 1