Daebarkee
Daebarkee

Reputation: 683

spark withColumn value generation from all column values

I want to add a column from all existing column values in the same row. For example,

col1 col2 ... coln      col_new
------------------      -------
True False ...False     "col1-..."
False True ...True      "col2-...-coln"

That is, when a value is True, then add its column name with "-" separator and keep doing the same until the last column. We don't know how many columns we will have.

How can I achieve this with withColumn() in Spark? (Scala)

Upvotes: 0

Views: 1285

Answers (1)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

If the columns are all BooleanTypes then you can write a udf function to get the new column as below

import org.apache.spark.sql.functions._
val columnNames = df.columns

def concatColNames = udf((array: collection.mutable.WrappedArray[Boolean]) => array.zip(columnNames).filter(x => x._1 == true).map(_._2).mkString("-"))

df.withColumn("col_new", concatColNames(array(df.columns.map(col): _*))).show(false)

If the columns are all StringTypes then you just need to modify the udf function as below

def concatColNames = udf((array: collection.mutable.WrappedArray[String]) => array.zip(columnNames).filter(x => x._1 == "True").map(_._2).mkString("-"))

You should get what you require

Upvotes: 3

Related Questions