Bharath
Bharath

Reputation: 13

How to get the column name which is not null

I have a Spark Dataframe as below

ID Col A Col B
1 null Some Value
2 Some Value null

I need to add a new column which contains the column name (among Col A and Col B) which is not null. So the expected dataframe should look like,

ID Col A Col B result
1 null Some Value Col B
2 Some Value null Col A

Any help would be much appreciated.

Thank you!

Upvotes: 0

Views: 899

Answers (1)

ggordon
ggordon

Reputation: 10035

after creating temp views from your dataframe eg

df.createOrReplaceTempView("my_data")

you may run the following on your spark session using newdf = sparkSession.sql("query here")

SELECT
    ID,
    ColA,
    ColB,
    CASE
        WHEN ColA IS NULL AND ColB IS NULL THEN NULL
        WHEN ColB IS NULL THEN 'ColA'
        WHEN ColA IS NULL THEN 'ColB'
        ELSE 'ColA Col B'
    END AS result
FROM my_data

or just using python

from pyspark.sql.functions import when, col

df = df.withColumn("result",when(
    col("Col A").isNull() & col("Col B").isNull() , None
).when(
    col("Col B").isNull() ,'Col A'
).when(
    col("Col A").isNull() ,'Col B'
).otherwise('Col A Col B') 
)

Upvotes: 1

Related Questions