Reputation: 658
If I have a case class like this:
Person(name:String = null, rank:Integer = null)
And I have a dataset: Dataset[Person]
Let's say the dataset has 5 person objects:
Dataset[ Person(name = "Jack",id = 100, rank = null),
Person(name = "Mary",id = 400, rank = null),
Person(name = "Tom",id = 199, rank = null),
Person(name = "Linda", id = 55, rank = null),
Person(name = "Wendy", id = 30, rank = null)]
I want to populate the rank field in Scala, after sorting the dataset by id. So that the dataset becomes:
Dataset[ Person(name = "Wendy", id = 30, rank = 1),
Person(name = "Linda", id = 55, rank = 2),
Person(name = "Jack", id = 100, rank = 3),
Person(name = "Tom", id = 199, rank = 4),
Person(name = "Mary", id = 400, rank = 5)]
Thanks in advance!
Upvotes: 0
Views: 1189
Reputation: 23099
If you have a dataset then, you can add rank column using row_number function
ds.withColumn("rank", row_number().over(Window.orderBy($"id")))
Or also with the rank function
ds.withColumn("rank", rank().over(Window.orderBy("id")))
def row_number(): Column
Window function: returns a sequential number starting at 1 within a window partition.
Hope this helps!
Upvotes: 1