Spark: How to group the dataframe in columns

Question

I have this dataframe "df":

Country	customer	total
England	John	4
France	Sally	3
England	John	2
France	Sally	1

I would like to create a new df called "out", where I group the "Country" column in rows and the "customer" one in columns. The cells shouls show the sum:

Country	John	Sally
England	6	0
France	0	4

I know how to groupby the rows with:

out = df.groupby.("Country").agg(sum("total"))

But I don't know how to groupby the "customer" column in the columns (keeping the first column named as "Country".

Thanks!

Mahan Hazrati Sagharchi · Accepted Answer

looks like you want to pivot your data imagin if you have this data set:

+-------+------+-------+
|Product|Amount|Country|
+-------+------+-------+
| Banana|  1000|    USA|
|Carrots|  1500|    USA|
|  Beans|  1600|    USA|
| Orange|  2000|    USA|
| Orange|  2000|    USA|
| Banana|   400|  China|
|Carrots|  1200|  China|
|  Beans|  1500|  China|
| Orange|  4000|  China|
| Banana|  2000| Canada|
|Carrots|  2000| Canada|
|  Beans|  2000| Mexico|
+-------+-----+-------+

and would like to pivot it like this:

+-------+------+-----+------+----+
|Product|Canada|China|Mexico| USA|
+-------+------+-----+------+----+
| Orange|  null| 4000|  null|4000|
|  Beans|  null| 1500|  2000|1600|
| Banana|  2000|  400|  null|1000|
|Carrots|  2000| 1200|  null|1500|

which i think it's the way you want it to be, then you need to run your code like this:

val pivotDF = df.groupBy("Product").pivot("Country").sum("Amount")
pivotDF.show()

For reference: Click Here

Spark: How to group the dataframe in columns

Answers (1)

Related Questions