Priyanka
Priyanka

Reputation: 51

Create a new column based on Condition in Spark Dataframe

How to create a new column in Dataframe DF based on give condition. I have array of String and want to compare that with existing dataframe

dataframe DF

    +-------------------+-----------+
    |     DiffColumnName|   Datatype|
    +-------------------+-----------+
    |  DEST_COUNTRY_NAME| StringType|
    |ORIGIN_COUNTRY_NAME| StringType|
    |              COUNT|IntegerType|
    +-------------------+-----------+

and Array of String having column names( this is not constant and can be changed)

    val diffcolarray = Array("ORIGIN_COUNTRY_NAME", "COUNT")

I want to create a new column in DF based on a condition that if columns present in diffcolarray is also present in Dataframe's column DiffColumnName then yes else no.

I have tried below options however getting error

    val newdf = df.filter(when(col("DiffColumnName") === df.columns.filter(diffcolarray.contains(_)), "yes").otherwise("no")).as("issue")

    val newdf = valdfe.filter(when(col("DiffColumnName") === df.columns.map(diffcolarray.contains(_)), "yes").otherwise("no")).as("issue")

Looks like when comparing there is datatype mismatch.Output should be something like this. Any suggestion would be helpful. Thank you

    +-------------------+-----------+----------+
    |     DiffColumnName|   Datatype|   Issue  |
    +-------------------+-----------+----------+
    |  DEST_COUNTRY_NAME| StringType|   NO     |
    |ORIGIN_COUNTRY_NAME| StringType|   NO     |
    |              COUNT|IntegerType|   YES    |
    +-------------------+-----------+----------+

Upvotes: 0

Views: 1091

Answers (1)

linusRian
linusRian

Reputation: 340

This can give you the desired output.

df.withColumn("Issue",when(col("DiffColumnName").isin(diffcolarray: _*),"YES").otherwise("NO")).show(false)

Upvotes: 1

Related Questions