Jusep
Jusep

Reputation: 187

Spark: How to group the dataframe in columns

I have this dataframe "df":

Country customer total
England John 4
France Sally 3
England John 2
France Sally 1

I would like to create a new df called "out", where I group the "Country" column in rows and the "customer" one in columns. The cells shouls show the sum:

Country John Sally
England 6 0
France 0 4

I know how to groupby the rows with:

out = df.groupby.("Country").agg(sum("total"))

But I don't know how to groupby the "customer" column in the columns (keeping the first column named as "Country".

Thanks!

Upvotes: 1

Views: 46

Answers (1)

looks like you want to pivot your data imagin if you have this data set:

+-------+------+-------+
|Product|Amount|Country|
+-------+------+-------+
| Banana|  1000|    USA|
|Carrots|  1500|    USA|
|  Beans|  1600|    USA|
| Orange|  2000|    USA|
| Orange|  2000|    USA|
| Banana|   400|  China|
|Carrots|  1200|  China|
|  Beans|  1500|  China|
| Orange|  4000|  China|
| Banana|  2000| Canada|
|Carrots|  2000| Canada|
|  Beans|  2000| Mexico|
+-------+-----+-------+

and would like to pivot it like this:

+-------+------+-----+------+----+
|Product|Canada|China|Mexico| USA|
+-------+------+-----+------+----+
| Orange|  null| 4000|  null|4000|
|  Beans|  null| 1500|  2000|1600|
| Banana|  2000|  400|  null|1000|
|Carrots|  2000| 1200|  null|1500|

which i think it's the way you want it to be, then you need to run your code like this:

val pivotDF = df.groupBy("Product").pivot("Country").sum("Amount")
pivotDF.show()

For reference: Click Here

Upvotes: 2

Related Questions