abc_spark
abc_spark

Reputation: 383

How to take row_number() based on a condition in spark with scala

I have the below data frame -

+----+-----+---+
| val|count| id|
+----+-----+---+
|   a|   10| m1|
|   b|   20| m1|
|null|   30| m1|
|   b|   30| m2|
|   c|   40| m2|
|null|   50| m2|
+----+-----+---+

created by -

 val df1=Seq(
 ("a","10","m1"),
 ("b","20","m1"),
 (null,"30","m1"),
 ("b","30","m2"),
 ("c","40","m2"),
 (null,"50","m2")
 )toDF("val","count","id")

I am trying to make a rank with the help of row_number() and window fuction as below.

df1.withColumn("rannk_num", row_number() over Window.partitionBy("id").orderBy("count")).show
+----+-----+---+---------+
| val|count| id|rannk_num|
+----+-----+---+---------+
|   a|   10| m1|        1|
|   b|   20| m1|        2|
|null|   30| m1|        3|
|   b|   30| m2|        1|
|   c|   40| m2|        2|
|null|   50| m2|        3|
+----+-----+---+---------+

But I have to filter those records with null values for column - val.

Expected output --

+----+-----+---+---------+
| val|count| id|rannk_num|
+----+-----+---+---------+
|   a|   10| m1|        1|
|   b|   20| m1|        2|
|null|   30| m1|     NULL|
|   b|   30| m2|        1|
|   c|   40| m2|        2|
|null|   50| m2|     NULL|
+----+-----+---+---------+

wondering if this is possible with minimal change. Also there can be 'n' number of values for the columns val and count.

Upvotes: 1

Views: 3948

Answers (1)

mck
mck

Reputation: 42332

Filter those rows with null val, assign them a null row number, and union back to the original dataframe.

val df1=Seq(
 ("a","10","m1"),
 ("b","20","m1"),
 (null,"30","m1"),
 ("b","30","m2"),
 ("c","40","m2"),
 (null,"50","m2")
 ).toDF("val","count","id")

df1.filter("val is not null").withColumn(
    "rannk_num", row_number() over Window.partitionBy("id").orderBy("count")
).union(
    df1.filter("val is null").withColumn("rannk_num", lit(null))
).show
+----+-----+---+---------+
| val|count| id|rannk_num|
+----+-----+---+---------+
|   a|   10| m1|        1|
|   b|   20| m1|        2|
|   b|   30| m2|        1|
|   c|   40| m2|        2|
|null|   30| m1|     null|
|null|   50| m2|     null|
+----+-----+---+---------+

Upvotes: 4

Related Questions