Utkarsh Saraf
Utkarsh Saraf

Reputation: 495

Convert Dataframe into Array of Json

I have create a spark data-frame in following manner :

+----+-------+
| age| number|
+----+-------+
|  16|     12|
|  16|     13|
|  16|     14|
|  17|     15|
|  17|     16|
|  17|     17|
+----+-------+

I want to convert it in following json format :

[{ 
 'age' : 16,  
 'name' : [12,13,14] 
 },{ 
 'age' : 17,  
 'name' : [15,16,17] 
 }]

How can i achieve the same?

Upvotes: 0

Views: 91

Answers (1)

Apurba Pandey
Apurba Pandey

Reputation: 1076

You can try to_json function. Something like this.

import spark.implicits._

val list = List((16,12), (16,13), (16,14), (17,15), (17,16), (17,17))
val df = spark.parallelize(list).toDF("age", "number")

val jsondf = df.groupBy($"age").agg(collect_list($"number").as("name"))
    .withColumn("json", to_json(struct($"age", $"name")))
    .drop("age", "name")
    .agg(collect_list($"json").as("json"))

The results are below. I hope it helps.

+------------------------------------------------------------+ |json | +------------------------------------------------------------+ |[{"age":16,"name":[12,13,14]}, {"age":17,"name":[15,16,17]}]| +------------------------------------------------------------+

Upvotes: 2

Related Questions