Benjamin
Benjamin

Reputation: 3477

Find for each row the first non-null value in a group of columns and the column name

I have a dataframe like this:

col1     col2     col3     Other
====================================
NULL     1        2        A
3        4        5        B
NULL     NULL     NULL     C

and I would like to get as result the following one with this rules:

Expected result:

FirstValue     ColName      Other
====================================
1             col2         A
3             col1         B
NULL          NULL         C

Upvotes: 4

Views: 584

Answers (1)

mck
mck

Reputation: 42422

You can use coalesce:

val df2 = df.select(
    coalesce(df.columns.dropRight(1).map(col):_*).as("FirstValue"), 
    coalesce(df.columns.dropRight(1).map(c => when(col(c).isNotNull, lit(c))):_*).as("ColName"), 
    col("Other")
)


df2.show
+----------+-------+-----+
|FirstValue|ColName|Other|
+----------+-------+-----+
|         1|   col2|    A|
|         3|   col1|    B|
|      null|   null|    C|
+----------+-------+-----+

Upvotes: 4

Related Questions