Reputation: 373
My goal is to have a spark dataframe that holds each of my Candy
objects in a separate row, with their respective properties
+------------------------------------+
main
+------------------------------------+
{"brand":"brand1","name":"snickers"}
+------------------------------------+
{"brand":"brand2","name":"haribo"}
+------------------------------------+
Case class for Proof of concept
case class Candy(
brand: String,
name: String)
val candy1 = Candy("brand1", "snickers")
val candy2 = Candy("brand2", "haribo")
So far I have only managed to put them in the same row with:
import org.json4s.DefaultFormats
import org.json4s.jackson.Serialization.{read, write}
implicit val formats = DefaultFormats
val body = write(Array(candy1, candy2))
val df=Seq(body).toDF("main")
df.show(5, false)
giving me everything in one row instead of 2. How can I split each object up into its own row while maintaining the schema of my Candy
object?
+-------------------------------------------------------------------------+
| main |
+-------------------------------------------------------------------------+
|[{"brand":"brand1","name":"snickers"},{"brand":"brand2","name":"haribo"}]|
+-------------------------------------------------------------------------+
Upvotes: 1
Views: 216
Reputation: 7928
Do you want to keep the item as a json string inside the dataframe?
If you don't, you can do this, taking advatange of the dataset ability to handle case classes:
val df=Seq(candy1, candy2).toDS
This will give you the following output:
+------+--------+
| brand| name|
+------+--------+
|brand1|snickers|
|brand2| haribo|
+------+--------+
IMHO that's the best optionm but if you want to keep your data as a json string, then you can first define a toJson
method inside your case class:
case class Candy(brand:String, name: String) {
def toJson = s"""{"brand": "$brand", "name": "$name" }"""
}
And then build the DF using that method:
val df=Seq(candy1.toJson, candy2.toJson).toDF("main")
OUTPUT
+----------------------------------------+
|main |
+----------------------------------------+
|{"brand": "brand1", "name": "snickers" }|
|{"brand": "brand2", "name": "haribo" } |
+----------------------------------------+
Upvotes: 2