Reputation: 383
I have dataframe:
df = spark.createDataFrame(
[
(2022,'A1', "cat", 'eng', 3, 56.768639),
(2022,'A1', "rabbit", 'eng', 10, 56.768639),
(2022, 'A2', "dog", 'eng', 10, 54.114841),
(2022, 'A2', "mouse", 'eng', 20, 81.114841),
],
["data",'group', "word", 'lang', 'count', 'value'] # add your column names here
)
df.show()
+----+-----+------+----+-----+---------+
|data|group| word|lang|count| value|
+----+-----+------+----+-----+---------+
|2022| A1| cat| eng| 3|56.768639|
|2022| A1|rabbit| eng| 10|56.768639|
|2022| A2| dog| eng| 10|54.114841|
|2022| A2| mouse| eng| 20|81.114841|
+----+-----+------+----+-----+---------+
I want to create a lst
column in which an array of dictionaries:
+----+-----+----+--------------------------------------------------------------------------------------------------+
|data|group|lang|lst |
+----+-----+----+--------------------------------------------------------------------------------------------------+
|2022|A1 |eng |[{count : 3, value : 56.768639, word : cat}, {count : 10, value : 56.768639, word :rabbit}]|
|2022|A2 |eng |[{count : 10, value : 54.114841, word : dog}, {count : 20, value : 81.114841, word : mouse}]|
I tried this:
map_values = df\
.groupBy('data', 'group', 'lang')\
.agg(F.collect_list(F.create_map(F.col('word'), F.col('count'), F.col('value'))).alias('descr'))
But it gives me an error
cannot resolve 'map(`word`, `count`, `value`)' due to data type mismatch: map expects a positive even number of arguments.;
Upvotes: 0
Views: 313
Reputation: 4244
You can first use to_json
function to generate json string, then use collect_list
function to aggregate.
map_values = df\
.groupBy('data', 'group', 'lang')\
.agg(F.collect_list(F.to_json(F.struct(F.col('count'), F.col('value'), F.col('word')))).alias('descr'))
Upvotes: 1