Reputation: 93
I have this dataframe in pyspark.
+----------------------+------------------+------------+------------+
| transaction_date| id|latest_quote|policy_price|
+----------------------+------------------+------------+------------+
| 16/12/2022|0063t000013qmPFAAY|2022-11-25 | 899.18|
| 16/12/2022|0063t000013qQNTAA2|2022-11-30 | 3147.78|
+----------------------+------------------+------------+------------+
and i need this.
+--------------------------------------------------------------------------------------+
|unique_column
|
+--------------------------------------------------------------------------------------+
|{"transaction_date":"16/12/2022", "trans_opp":[
|
| {"id":"0063t000013qmPFAAY", "latest_quote":"2022-11-25", "policy_price":"899.18"},
|
| {"id":"0063t000013qQNTAA2", "latest_quote":"2022-11-30", "policy_price":" 3147.78"}]}|
+--------------------------------------------------------------------------------------+
The idea is all rows stay in one column call unique_column and one row,
Upvotes: 1
Views: 33
Reputation: 1858
I didn't use exactly the same dataframe you used to ask the question, but I the column names are the in my solution:
df = spark.createDataFrame([
(1, 2, 3, 4),
(1, 6, 7, 8)
], ['transaction_date', 'id', 'latest_quote', 'policy_price'])
df = (
df
.groupBy('transaction_date')
.agg(f.collect_list(f.struct(f.col('id'), f.col('latest_quote'), f.col('policy_price'))).alias('trans_opp'))
.withColumn('unique_column', f.struct(f.col('transaction_date'), f.col('trans_opp')))
.withColumn('unique_column', f.to_json(f.col('unique_column')))
.select('unique_column')
)
df.show(truncate= False)
output:
+---------------+
|unique_column |
+--------------------------------------------------------------------------------------------------------------------------+
|{"transaction_date":1,"trans_opp":[{"id":2,"latest_quote":3,"policy_price":4},{"id":6,"latest_quote":7,"policy_price":8}]}|
+--------------------------------------------------------------------------------------------------------------------------+
Upvotes: 1