Markus
Markus

Reputation: 3782

How to add "where" clause to calculating maximum value in Spark?

I use Spark 2.2.0 and Scala 2.11. I want to calculate rank as sold divided by maximum sold value within the same type (i.e. the same as actual row's value). But I do not know how to consider type when calculating max.

This is my current code. It calculates sold as the difference between the maximum and minimum stock for the given period of time. The value sold means how many products were sold in this period of time.

val sales = df.select($"product_pk",$"type",$"stock").groupBy($"type",$"product_pk").agg((max($"stock")-min($"stock")) as "sold")
val ranks = sales.withColumn("rank",$"sold"/max($"sold"))

Upvotes: 1

Views: 663

Answers (1)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

Here's what you can do, if I understood your question correctly

import org.apache.spark.sql.expressions._
val windowSpec = Window.partitionBy("type")
val ranks = sales.withColumn("rank",$"sold"/(max($"sold").over(windowSpec)))

I hope the answer is helpful

Upvotes: 2

Related Questions