flame1481
flame1481

Reputation: 23

How do I convert a spark dataframe to a JSON map?

I currently have a csv of two columns:

"name","numbers"
"abc","123"
"abc","234"
"def","123"

I am trying to format the data into a JSON map:

{"abc":["123","234"],"def":["123"]}

I created a data frame and then used collect_list/spark sql to group by names and collect numbers but i'm not able to get the strings to have quotes and i'm also not sure how to format it as a JSON map:

scala> val df = spark.read.format("csv").option("header","true").load("test.csv")
df: org.apache.spark.sql.DataFrame = [name: string, numbers: string]              

scala> df.createOrReplaceTempView("v")

scala> spark.sql("select name, collect_list(numbers) from v group by name limit 2").show(false)
+----+---------------------+
|name|collect_list(numbers)|
+----+---------------------+
|abc |[123, 234]           |
|def |[123]                |
+----+---------------------+

Any suggestions on a better approach to get quotes around the numbers and then generate a JSON map out of the result?

Upvotes: 1

Views: 625

Answers (1)

s.polam
s.polam

Reputation: 10382

Check below step by step execution code.

scala> 
df
.groupBy($"name")
.agg(collect_list($"numbers").as("numbers"))
.show(false)

+----+----------+
|name|numbers   |
+----+----------+
|abc |[123, 234]|
|def |[123]     |
+----+----------+
scala> df
.groupBy($"name")
.agg(collect_list($"numbers").as("numbers"))
.select(map($"name", $"numbers").as("data"))
.show(false)

+-------------------+
|data               |
+-------------------+
|{abc -> [123, 234]}|
|{def -> [123]}     |
+-------------------+
scala> 
df
.groupBy($"name")
.agg(collect_list($"numbers").as("numbers"))
.select(map($"name", $"numbers").as("data"))
.select(to_json($"data").as("output"))
.show(false)

+---------------------+
|output               |
+---------------------+
|{"abc":["123","234"]}|
|{"def":["123"]}      |
+---------------------+
scala> df
.groupBy($"name")
.agg(collect_list($"numbers").as("numbers"))
.select(map($"name", $"numbers").as("data"))
.select(collect_list(to_json($"data")).as("output"))
.show(false)

+----------------------------------------+
|output                                  |
+----------------------------------------+
|[{"abc":["123","234"]}, {"def":["123"]}]|
+----------------------------------------+

Upvotes: 1

Related Questions