Reputation: 303
I have dataframe_1:
+-------------+----+---------+
| Name| Age| Salary|
+-------------+----+---------+
|Avery Bradley|25.0|7730337.0|
| Jae Crowder|25.0|6796117.0|
+-------------+----+---------+
and want to transform it to dataframe_2:
+----------------------------------------------------------------------------------------------------------------------+
| json_data |
+----------------------------------------------------------------------------------------------------------------------+
|[{"Name": "Avery Bradley", "Age": 25.0, "Salary" 7730337.0}, {"Name": "Jae Crowder", "Age": 25.0, "Salary" 6796117.0}]|
+----------------------------------------------------------------------------------------------------------------------+
I can do dataframe_1.toPandas().to_dict(orient="records")
, but this would be dataframe to dict(json object) transformation and I need dataframe to dataframe transformation.
A solution in PySpark, if possible, would be appreciated.
Upvotes: 0
Views: 159
Reputation: 42352
You can do a collect_list
of json:
import pyspark.sql.functions as F
df2 = df.agg(F.collect_list(F.to_json(F.struct('*'))).alias('json_data'))
df2.show(truncate=False)
+--------------------------------------------------------------------------------------------------------------+
|json_data |
+--------------------------------------------------------------------------------------------------------------+
|[{"Name":"Avery Bradley","Age":25.0,"Salary":7730337.0}, {"Name":"Jae Crowder","Age":25.0,"Salary":6796117.0}]|
+--------------------------------------------------------------------------------------------------------------+
Upvotes: 1
Reputation: 6082
You can merge columns to a map then create a JSON out of it
(df
.withColumn('json', F.to_json(F.create_map(
F.lit('name'), F.col('name'),
F.lit('age'), F.col('age'),
F.lit('salary'), F.col('salary'),
)))
.agg(F.collect_list('json').alias('json_value'))
)
+----------------------------------------------------------------------------------------------------------------------+
|json_value |
+----------------------------------------------------------------------------------------------------------------------+
|[{"name":"Avery Bradley","age":"25.0","salary":"7730337.0"}, {"name":"Jae Crowder","age":"25.0","salary":"6796117.0"}]|
+----------------------------------------------------------------------------------------------------------------------+
Upvotes: 1