Reputation: 187
I have this dataframe "df":
Country | customer | total |
---|---|---|
England | John | 4 |
France | Sally | 3 |
England | John | 2 |
France | Sally | 1 |
I would like to create a new df called "out", where I group the "Country" column in rows and the "customer" one in columns. The cells shouls show the sum:
Country | John | Sally |
---|---|---|
England | 6 | 0 |
France | 0 | 4 |
I know how to groupby the rows with:
out = df.groupby.("Country").agg(sum("total"))
But I don't know how to groupby the "customer" column in the columns (keeping the first column named as "Country".
Thanks!
Upvotes: 1
Views: 46
Reputation: 310
looks like you want to pivot your data imagin if you have this data set:
+-------+------+-------+
|Product|Amount|Country|
+-------+------+-------+
| Banana| 1000| USA|
|Carrots| 1500| USA|
| Beans| 1600| USA|
| Orange| 2000| USA|
| Orange| 2000| USA|
| Banana| 400| China|
|Carrots| 1200| China|
| Beans| 1500| China|
| Orange| 4000| China|
| Banana| 2000| Canada|
|Carrots| 2000| Canada|
| Beans| 2000| Mexico|
+-------+-----+-------+
and would like to pivot it like this:
+-------+------+-----+------+----+
|Product|Canada|China|Mexico| USA|
+-------+------+-----+------+----+
| Orange| null| 4000| null|4000|
| Beans| null| 1500| 2000|1600|
| Banana| 2000| 400| null|1000|
|Carrots| 2000| 1200| null|1500|
which i think it's the way you want it to be, then you need to run your code like this:
val pivotDF = df.groupBy("Product").pivot("Country").sum("Amount")
pivotDF.show()
For reference: Click Here
Upvotes: 2