Pivot in spark scala

Question

I have a df like this.

+---+-----+-----+----+
|  M|M_Max|Sales|Rank|
+---+-----+-----+----+
| M1|  100|  200|   1|
| M1|  100|  175|   2|
| M1|  101|  150|   3|
| M1|  100|  125|   4|
| M1|  100|   90|   5|
| M1|  100|   85|   6|
| M2|  200| 1001|   1|
| M2|  200|  500|   2|
| M2|  201|  456|   3|
| M2|  200|  345|   4|
| M2|  200|  231|   5|
| M2|  200|  123|   6|
+---+-----+-----+----+

I am doing a pivot operation on top of this df like this.

df.groupBy("M").pivot("Rank").agg(first("Sales")).show
+---+----+---+---+---+---+---+
|  M|   1|  2|  3|  4|  5|  6|
+---+----+---+---+---+---+---+
| M1| 200|175|150|125| 90| 85|
| M2|1001|500|456|345|231|123|
+---+----+---+---+---+---+---+

But my expected output is like below. This means I need to get the column - Max(M_Max) in the output.

Here M_Max is the max of column - M_Max. My Expected Output is like below. is this possible with Pivot function without using df joins.?

+---+----+---+---+---+---+---+-----+
|  M|   1|  2|  3|  4|  5|  6|M_Max|
+---+----+---+---+---+---+---+-----+
| M1| 200|175|150|125| 90| 85|  101|
| M2|1001|500|456|345|231|123|  201|
+---+----+---+---+---+---+---+-----+

Anand Sai · Accepted Answer

The trick is to apply window functions. The solution is given below:

scala> val df = Seq(
     |      | ("M1",100,200,1),
     |      | ("M1",100,175,2),
     |      | ("M1",101,150,3),
     |      | ("M1",100,125,4),
     |      | ("M1",100,90,5),
     |      | ("M1",100,85,6),
     |      | ("M2",200,1001,1),
     |      | ("M2",200,500,2),
     |      | ("M2",200,456,3),
     |      | ("M2",200,345,4),
     |      | ("M2",200,231,5),
     |      | ("M2",201,123,6)
     |      | ).toDF("M","M_Max","Sales","Rank")
df: org.apache.spark.sql.DataFrame = [M: string, M_Max: int ... 2 more fields]

scala> import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.expressions.Window

scala> val w = Window.partitionBy("M")
w: org.apache.spark.sql.expressions.WindowSpec = org.apache.spark.sql.expressions.WindowSpec@49b4e11c

scala> df.withColumn("new", max("M_Max") over (w)).groupBy("M", "new").pivot("Rank").agg(first("Sales")).withColumnRenamed("new", "M_Max").show
+---+-----+----+---+---+---+---+---+
|  M|M_Max|   1|  2|  3|  4|  5|  6|
+---+-----+----+---+---+---+---+---+
| M1|  101| 200|175|150|125| 90| 85|
| M2|  201|1001|500|456|345|231|123|
+---+-----+----+---+---+---+---+---+


scala> df.show
+---+-----+-----+----+
|  M|M_Max|Sales|Rank|
+---+-----+-----+----+
| M1|  100|  200|   1|
| M1|  100|  175|   2|
| M1|  101|  150|   3|
| M1|  100|  125|   4|
| M1|  100|   90|   5|
| M1|  100|   85|   6|
| M2|  200| 1001|   1|
| M2|  200|  500|   2|
| M2|  200|  456|   3|
| M2|  200|  345|   4|
| M2|  200|  231|   5|
| M2|  201|  123|   6|
+---+-----+-----+----+

Let me know if it helps!!

Pivot in spark scala

Answers (2)

Related Questions