Techno04335
Techno04335

Reputation: 1445

How to convert python dataframe to JSON

I am utilizing pyspark in a databricks environment and I have a dataframe as below:

display(TestDF)

Count          Value
10             Blue
5              Green
21             Red

How can I convert the DF into a JSON format as below:

{"Blue":10,"Green":5,"Red":21}

I have tried below, however, the JSON is not quite in the right format as above

TestDF = TestDF.tojson()

{"count":10,"value":"Blue"}
{"count":5,"value":"Green"}
{"count":21,"value":"Red"}

Thanks.

Upvotes: 1

Views: 148

Answers (1)

notNull
notNull

Reputation: 31550

We can use map_from_arrays from Spark-2.4+ and collect_list on count,value columns.

#if count type is not int then cast to array<int>
df.agg(to_json(map_from_arrays(collect_list(col("Value")),collect_list(col("Count")).cast("array<int>"))).alias("json")).\
show(10,False)

#if count type int then no need to casting
df.agg(to_json(map_from_arrays(collect_list(col("Value")),collect_list(col("Count")).cast("array<int>"))).alias("json")).\
show(10,False)
#+------------------------------+
#|json                          |
#+------------------------------+
#|{"Blue":10,"Green":5,"Red":21}|
#+------------------------------+

#get as string
df.agg(to_json(map_from_arrays(collect_list(col("Value")),collect_list(col("Count")).cast("array<int>"))).alias("json")).collect()[0][0]
#or
df.agg(to_json(map_from_arrays(collect_list(col("Value")),collect_list(col("Count")).cast("array<int>"))).alias("json")).collect()[0]['json']
#{"Blue":10,"Green":5,"Red":21}

Upvotes: 4

Related Questions