How to do a window partition and extract only unique values per group in Spark Scala

Question

Hi how's it going? I'm having an issue with excluding ties using window partition in scala. I want to collect the top 2 for each group excluding ties. So if there are three values, [5,5,3] for a particular group I want it to return 5 and 3, not 5 and 5.

This is my code so far:

  val dummy_df = Seq(("yankees",5,4),("yankees",3,7),("yankees",5,2),("yankees",3,4),("red sox",7,3),("red sox",2,5),("red sox",2,4)).toDF("team","runs","opponent_runs")
  val team_runs_partition= Window.partitionBy("team").orderBy($"runs".desc)
  dummy_df.withColumn("rn",row_number.over(team_runs_partition)).filter($"rn"<=2).show

In tabular view the original dataframe looks like this:

and my current output dataframe looks like this:

However as you can see, due to the tie, it's not returning the top 2 unique "runs" for the Yankees but rather two instances of 5. I tried using the rank function but that becomes even worse with ties because if I have 5 ties at the top, the second value falls all the way to 6.

How would I solve this?

Thanks so much and have a great day!

How to do a window partition and extract only unique values per group in Spark Scala

Answers (1)

Related Questions