vincwng
vincwng

Reputation: 658

Spark Sql Dataset get index number

If I have a case class like this:

Person(name:String = null, rank:Integer = null)

And I have a dataset: Dataset[Person]

Let's say the dataset has 5 person objects:

Dataset[  Person(name = "Jack",id = 100, rank = null), 
          Person(name = "Mary",id = 400, rank = null),
          Person(name = "Tom",id = 199, rank = null), 
          Person(name = "Linda", id = 55, rank = null),
          Person(name = "Wendy", id = 30, rank = null)]

I want to populate the rank field in Scala, after sorting the dataset by id. So that the dataset becomes:

Dataset[  Person(name = "Wendy", id = 30, rank = 1), 
          Person(name = "Linda", id = 55, rank = 2),
          Person(name = "Jack", id = 100, rank = 3), 
          Person(name = "Tom", id = 199, rank = 4),
          Person(name = "Mary", id = 400, rank = 5)]

Thanks in advance!

Upvotes: 0

Views: 1189

Answers (1)

koiralo
koiralo

Reputation: 23099

If you have a dataset then, you can add rank column using row_number function

ds.withColumn("rank", row_number().over(Window.orderBy($"id")))

Or also with the rank function

ds.withColumn("rank", rank().over(Window.orderBy("id")))

def row_number(): Column

Window function: returns a sequential number starting at 1 within a window partition.

Hope this helps!

Upvotes: 1

Related Questions