Rakshith
Rakshith

Reputation: 664

Converting a sequence of Json Object to An Rdd

Iam currently having a json object say student.json. The Structure looks something like this

{"serialNo":"1","name":"Rahul"}
{"serialNo":"2","name":"Rakshith"}

case class Student(serialNo:Int,name:String)

student.json is a huge file which Iam planning to parse through a spark job. And the snippet :

    import play.api.libs.json.{ Json, JsObject, JsString }
.....
.....
    for(jsonLine <-sc.textFile("student.json")
    student<- Json.parse(jsonLine).asOpt[Student])
yield(student.serialNumber -> student.name)

Is there a better way to do this??

Upvotes: 0

Views: 638

Answers (1)

David S.
David S.

Reputation: 11200

If student.json is a huge file, and each line is just a valid json object, you should do:

val myRdd = sc.textFile("student.json").map(l=> Json.parse(l).asOpt[Student])

If you want to get the RDD to your local master, you can:

val students = myRdd.collect()..// then you can do operate it in the old fashion way.

I saw you are importing play.api.libs.json which is from the Play Framework. I don't think running a Spark program in a web application is a good idea...

Upvotes: 1

Related Questions