Reputation: 182
I'm working on PySpark and I have long format dataframe like this:
KPI | GROUP | TIME | VALUE |
---|---|---|---|
Sales | A | Before | 100 |
Sales | A | After | 135 |
Sales | B | Before | 90 |
Sales | B | After | 98 |
Revenue | A | Before | 10 |
Revenue | A | After | 12 |
Revenue | B | Before | 5 |
Revenue | B | After | 8 |
And what I expect to have is something like this:
KPI | GROUP | BEFORE | AFTER |
---|---|---|---|
Sales | A | 100 | 135 |
Sales | B | 90 | 98 |
Revenue | A | 10 | 12 |
Revenue | B | 5 | 8 |
Upvotes: 1
Views: 356
Reputation: 26676
Just pivot
df1.groupBy('KPI' ,'GROUP').pivot('TIME').agg(first('VALUE')).show()
Upvotes: 3