Reputation: 121
I'm trying to transpose some of my PySpark dataframe rows into columns
I've done many attempts but I can't seem to get the correct results.
Dataframe currently looks like this
ArticleID |Category |Value
1 Color Black
1 Gender Male
2 Color Green
2 Gender Female
3 Color Blue
3 Gender Male
Situation I'm trying to get is
ArticleID |Color |Gender
1 Black Male
2 Green Female
3 Blue Male
Edit: Question might be the same in some areas but this one required an aggregation on first item for the pivoted row.
agg(f.first())
Suggested question could aggregate on numerical operations.
Upvotes: 0
Views: 1558
Reputation: 214927
Use groupBy
+ pivot
:
import pyspark.sql.functions as f
df.groupBy('ArticleID').pivot('Category').agg(f.first('Value')).show()
+---------+-----+------+
|ArticleID|Color|Gender|
+---------+-----+------+
| 3| Blue| Male|
| 1|Black| Male|
| 2|Green|Female|
+---------+-----+------+
Upvotes: 4