Reputation: 11
I want to make a table using list element frequency from dataframe
example)
As Is
+----+---------------+
| id | data |
+----+---------------+
|a |[1,2,3,4,5] |
|b |[2,2,4,5] |
|c |[56,7,1,1,1] |
+----+---------------+
To Be
+----+-----+-----+-----+-----+-----+-----+-----+
| id | 1 | 2 | 3 | 4 | 5 | 7 | 56 |
+----+-----+-----+-----+-----+-----+-----+-----+
|a | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
|b | 0 | 2 | 0 | 1 | 1 | 0 | 0 |
|c | 3 | 0 | 0 | 0 | 0 | 1 | 1 |
+----+-----+-----+-----+-----+-----+-----+-----+
How can I make "AS IS" to "TO BE"?
Upvotes: 0
Views: 186
Reputation: 6644
A possible way is to explode
the arrays and then pivot
the exploded values.
# input data
data_sdf.show()
# +---+----------------+
# | id| data|
# +---+----------------+
# | a| [1, 2, 3, 4, 5]|
# | b| [2, 2, 4, 5]|
# | c|[56, 7, 1, 1, 1]|
# +---+----------------+
data_sdf. \
withColumn('data_explode', func.explode('data')). \
groupBy('id'). \
pivot('data_explode'). \
count(). \
fillna(0). \
show()
# +---+---+---+---+---+---+---+---+
# | id| 1| 2| 3| 4| 5| 7| 56|
# +---+---+---+---+---+---+---+---+
# | c| 3| 0| 0| 0| 0| 1| 1|
# | b| 0| 2| 0| 1| 1| 0| 0|
# | a| 1| 1| 1| 1| 1| 0| 0|
# +---+---+---+---+---+---+---+---+
Upvotes: 1