FFGH
FFGH

Reputation: 121

Transpose pyspark rows into columns

I'm trying to transpose some of my PySpark dataframe rows into columns

I've done many attempts but I can't seem to get the correct results.

Dataframe currently looks like this

ArticleID   |Category  |Value
1            Color      Black
1            Gender     Male
2            Color      Green
2            Gender     Female
3            Color      Blue
3            Gender     Male

Situation I'm trying to get is

ArticleID   |Color  |Gender
1            Black   Male
2            Green   Female
3            Blue    Male

Edit: Question might be the same in some areas but this one required an aggregation on first item for the pivoted row.

agg(f.first())

Suggested question could aggregate on numerical operations.

Upvotes: 0

Views: 1558

Answers (1)

akuiper
akuiper

Reputation: 214927

Use groupBy + pivot:

import pyspark.sql.functions as f
df.groupBy('ArticleID').pivot('Category').agg(f.first('Value')).show()
+---------+-----+------+
|ArticleID|Color|Gender|
+---------+-----+------+
|        3| Blue|  Male|
|        1|Black|  Male|
|        2|Green|Female|
+---------+-----+------+

Upvotes: 4

Related Questions