Reputation: 23
I currently have a csv of two columns:
"name","numbers"
"abc","123"
"abc","234"
"def","123"
I am trying to format the data into a JSON map:
{"abc":["123","234"],"def":["123"]}
I created a data frame and then used collect_list/spark sql to group by names and collect numbers but i'm not able to get the strings to have quotes and i'm also not sure how to format it as a JSON map:
scala> val df = spark.read.format("csv").option("header","true").load("test.csv")
df: org.apache.spark.sql.DataFrame = [name: string, numbers: string]
scala> df.createOrReplaceTempView("v")
scala> spark.sql("select name, collect_list(numbers) from v group by name limit 2").show(false)
+----+---------------------+
|name|collect_list(numbers)|
+----+---------------------+
|abc |[123, 234] |
|def |[123] |
+----+---------------------+
Any suggestions on a better approach to get quotes around the numbers and then generate a JSON map out of the result?
Upvotes: 1
Views: 625
Reputation: 10382
Check below step by step execution code.
scala>
df
.groupBy($"name")
.agg(collect_list($"numbers").as("numbers"))
.show(false)
+----+----------+
|name|numbers |
+----+----------+
|abc |[123, 234]|
|def |[123] |
+----+----------+
scala> df
.groupBy($"name")
.agg(collect_list($"numbers").as("numbers"))
.select(map($"name", $"numbers").as("data"))
.show(false)
+-------------------+
|data |
+-------------------+
|{abc -> [123, 234]}|
|{def -> [123]} |
+-------------------+
scala>
df
.groupBy($"name")
.agg(collect_list($"numbers").as("numbers"))
.select(map($"name", $"numbers").as("data"))
.select(to_json($"data").as("output"))
.show(false)
+---------------------+
|output |
+---------------------+
|{"abc":["123","234"]}|
|{"def":["123"]} |
+---------------------+
scala> df
.groupBy($"name")
.agg(collect_list($"numbers").as("numbers"))
.select(map($"name", $"numbers").as("data"))
.select(collect_list(to_json($"data")).as("output"))
.show(false)
+----------------------------------------+
|output |
+----------------------------------------+
|[{"abc":["123","234"]}, {"def":["123"]}]|
+----------------------------------------+
Upvotes: 1