Reputation: 2611
I have one data frame which has many columns almost 50 plus(as shown below),
+----+----+---+----+----+---+----+---+----+----+---+...
|c1 |c2 |c3 |c4 |c5 |c6 |c7 |c8 |type|clm |val |...
+----+----+---+----+----+---+----+---+----+----+---+...
| 11| 5.0|3.0| 3.0| 3.0|4.0| 3.0|3.0| t1 | a |5 |...
+----+----+---+----+----+---+----+---+----+----+---+...
| 31| 5.0|3.0| 3.0| 3.0|4.0| 3.0|3.0| t2 | b |6 |...
+----+----+---+----+----+---+----+---+----+----+---+...
| 11| 5.0|3.0| 3.0| 3.0|4.0| 3.0|3.0| t1 | a |9 |...
+----+----+---+----+----+---+----+---+----+----+---+...
I want to convert one of the column values to many columns, so thinking to use below code
df.groupBy("type").pivot("clm").agg(first("val")).show()
this is converting row values in to columns but other columns (c1 to c8) are not coming as part resultant data frame.
so is it okay to do below method to get all cloumns after pivot
df.groupBy("c1","c2","c3","c4","c5","c6","c7","c8","type").pivot("clm").agg(first("val")).show()
Upvotes: 0
Views: 438
Reputation: 9415
pivot is treated like an aggregator, just like any other.
df
.groupBy("type")
.agg(
pivot("clm").first("val"),
first("c1"),
first("c2"),
first("c3"),
first("c4"),
first("c5"),
first("c6"),
first("c7"),
first("c8")
).show()
Writing it like that assumes that you have duplicated values for c1..c8
within the same type
. If not, then the .groupby(...)
needs to be tuned for exactly how your data is organized.
Upvotes: 1