djsumdog
djsumdog

Reputation: 2710

Spray test case doesn't property unmarshal UTF-8 encoded json using HTTPEntity

I have a Scalatest for a piece of API written using the Spray framework that looks like the following:

"correctly deserializes multi-lang title metadata" in {
  implicit def json4sFormats: org.json4s.Formats = ModelJsonHelper.jsonFormats

  val v2MultiLangTitle = getStringFromResource("/json_samples/cmp.asset.v2.AssetWriteMultiMetadata.json")

  //Deserialize
  val v2AssetEither = HttpEntity(`application/cmp.ela.assetWrite.v2+json`, v2MultiLangTitle).as[Asset]
  v2AssetEither.isRight shouldEqual true
  v2AssetEither.right.map(asset => {
    asset.assetMetadata.getOrElse(List()).size shouldEqual(3)
    asset.assetMetadata.getOrElse(List())(1).language shouldEqual("es")
    asset.assetMetadata.getOrElse(List())(1).data.title shouldEqual(Some("Encabezado prueba de AFP"))
    asset.assetMetadata.getOrElse(List())(2).language shouldEqual("tlh")
    asset.assetMetadata.getOrElse(List())(2).data.title shouldEqual(Some("Daj jaw AFP"))
    asset.assetMetadata.getOrElse(List())(0).language shouldEqual("de")
    asset.assetMetadata.getOrElse(List())(0).data.title shouldEqual(Some("Test AFP Überschrift"))
  })
}

def getStringFromResource(input: String): String = {
  Source.fromInputStream(this.getClass.getResourceAsStream(input))(Codec.UTF8).getLines.mkString("")
}

The json being processed is the following

{
  "assetMetadata" : [
    {
      "title": "Test AFP Überschrift",
      "language": "de"
    },
    {
      "title": "Encabezado prueba de AFP",
      "language": "es"
    },
    {
      "title": "Daj jaw AFP",
      "language": "tlh"
    }
  ]
}

and the failure is occurring on the German umlaut:

[info] - correctly deserializes multi-lang title metadata *** FAILED ***
[info]   Some("Test AFP �berschrift") did not equal "Test AFP Überschrift" (V2AssetWriteMarshallerSpec.scala:367)

Under the surface, there is some json4s that is used in the unwrapping process. However if I use the json4s parser directly, the German character is processed correctly:

import scala.io.Source
import java.io.FileInputStream
val v = Source.fromInputStream(new FileInputStream("/path/to/project/src/test/resources/json_samples/cmp.asset.v2.AssetWriteMultiMetadata.json")).getLines.mkString("")
import org.json4s._
import org.json4s.native.JsonMethods._

val obj = parse(v)
(obj \ "assetMetadata")

Gives me the result:

res0: org.json4s.JValue = JArray(List(JObject(List((title,JString(Test AFP Überschrift)), ....

I'm on Spray 1.3.3 and json4s 3.2.10. That type in HttpEntity is a custom type and I've tried adding ( "charset" -> "UTF-8" ) as a parameter like so:

val `application/cmp.ela.assetWrite.v2+json` = register(
    MediaType.custom(
      mainType = "application",
      subType = "cmp.ela.assetWrite.v2+json",
      compressible = true,
      binary = false,
      parameters = Map[String,String]( "charset" -> "UTF-8" )
    )
  )

..but the test still fails with the invalid character. How do I get Spray to correctly unmarshall a string with international characters in it?

Upvotes: 1

Views: 558

Answers (1)

retrospectacus
retrospectacus

Reputation: 2586

I helped you find the solution in the Scala IRC channel. I found this link and you did the rest!

https://github.com/spray/spray/blob/master/spray-http/src/main/scala/spray/http/MediaType.scala#L168

Fixed test:

  //Deserialize
  val v2AssetEither = HttpEntity(`application/vnd.dsa.assetWrite.v2+json` withCharset(HttpCharsets.`UTF-8`), v2MultiLangTitle).as[Asset]
  v2AssetEither.isRight shouldEqual true

The key being adding

withCharset(HttpCharsets.`UTF-8`)

Upvotes: 1

Related Questions