Reputation: 1445
I am utilizing pyspark in a databricks environment and I have a dataframe as below:
display(TestDF)
Count Value
10 Blue
5 Green
21 Red
How can I convert the DF into a JSON format as below:
{"Blue":10,"Green":5,"Red":21}
I have tried below, however, the JSON is not quite in the right format as above
TestDF = TestDF.tojson()
{"count":10,"value":"Blue"}
{"count":5,"value":"Green"}
{"count":21,"value":"Red"}
Thanks.
Upvotes: 1
Views: 148
Reputation: 31550
We can use map_from_arrays
from Spark-2.4+
and collect_list
on count,value
columns.
#if count type is not int then cast to array<int>
df.agg(to_json(map_from_arrays(collect_list(col("Value")),collect_list(col("Count")).cast("array<int>"))).alias("json")).\
show(10,False)
#if count type int then no need to casting
df.agg(to_json(map_from_arrays(collect_list(col("Value")),collect_list(col("Count")).cast("array<int>"))).alias("json")).\
show(10,False)
#+------------------------------+
#|json |
#+------------------------------+
#|{"Blue":10,"Green":5,"Red":21}|
#+------------------------------+
#get as string
df.agg(to_json(map_from_arrays(collect_list(col("Value")),collect_list(col("Count")).cast("array<int>"))).alias("json")).collect()[0][0]
#or
df.agg(to_json(map_from_arrays(collect_list(col("Value")),collect_list(col("Count")).cast("array<int>"))).alias("json")).collect()[0]['json']
#{"Blue":10,"Green":5,"Red":21}
Upvotes: 4