geek-tech
geek-tech

Reputation: 105

Loading and Parsing JSON with Spark using Scala Jackson library

I'm trying to load and parse json files (Tweets) but I get back the below error

error: not found: value mapper
               Some(mapper.readValue(record, classOf[Tweet]))

And this is the scala script

import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.databind.DeserializationFeature

case class Tweet(tweet_id: Int, created_unixtime: Long, created_time: String, lang: String, displayname: String, time_zone: String, msg: String)

val = input.textFile("hdfs://localhost:54310/tmp/data_staging/tweets*") // tweets well loaded

// Parsing them
val result = input.flatMap(record => {
  try {
    Some(mapper.readValue(record, classOf[Tweet]))
  } catch {
    case e: Exception => None
  }
})

Upvotes: 0

Views: 1236

Answers (1)

Silvio
Silvio

Reputation: 4207

So, the question is how to load JSON but map it to a case class.

In that case, just use Spark's built-in JSON reader then convert to a DataSet of your case class:

case class Tweet(tweet_id: Int, created_unixtime: Long, created_time: String, lang: String, displayname: String, time_zone: String, msg: String)

val input = spark.read.json("hdfs://localhost:54310/tmp/data_staging/tweets*").as[Tweet]

The assumption here is that the fields in the JSON documents map to your case class. If that's not the case, then you can simply do a map to convert from the Row object to your custom case class.

Upvotes: 1

Related Questions