Reputation: 23
I'm trying to aggregate data. It works below.
name | id | day | value |
---|---|---|---|
ken | 01 | 02-01 | good |
ken | 01 | 02-02 | error |
spark_df
spark_df.groupBy("name", "id").\
agg(func.collect_list(func.create_map(func.col("day"),func.col("value)))).alias("day_val"))
I could aggregate day_val data as list of map. Like this
[{"day1":"value1"},{"day2":"value2"},{"day3":"value3"},....]
But I want to save it as
{"day1":"value1","day2":"value2","day3":"value3"}
Because on dynamodb I want to use it as not list but map. Can I convert it to array or aggregate it as map?
Thank you.
Upvotes: 2
Views: 522
Reputation: 32670
Or map_from_arrays
:
from pyspark.sql import functions as F
df1 = df.groupBy('name', 'id').agg(
F.map_from_arrays(
F.collect_list('day'),
F.collect_list('value')
).alias('day_val')
)
df1.show(truncate=False)
#+----+---+-------------------------------+
#|name|id |day_val |
#+----+---+-------------------------------+
#|ken |01 |[02-01 -> good, 02-02 -> error]|
#+----+---+-------------------------------+
Upvotes: 1
Reputation: 42352
You can use map_from_entries
:
import pyspark.sql.functions as F
result = df.groupBy('name', 'id').agg(
F.map_from_entries(
F.collect_list(
F.struct('day', 'value')
)
).alias('day_val')
)
result.show(truncate=False)
+----+---+-------------------------------+
|name|id |day_val |
+----+---+-------------------------------+
|ken |1 |[02-01 -> good, 02-02 -> error]|
+----+---+-------------------------------+
Upvotes: 1