Reputation: 683
I want to add a column from all existing column values in the same row. For example,
col1 col2 ... coln col_new
------------------ -------
True False ...False "col1-..."
False True ...True "col2-...-coln"
That is, when a value is True, then add its column name with "-" separator and keep doing the same until the last column. We don't know how many columns we will have.
How can I achieve this with withColumn()
in Spark? (Scala)
Upvotes: 0
Views: 1285
Reputation: 41957
If the columns are all BooleanTypes
then you can write a udf
function to get the new column as below
import org.apache.spark.sql.functions._
val columnNames = df.columns
def concatColNames = udf((array: collection.mutable.WrappedArray[Boolean]) => array.zip(columnNames).filter(x => x._1 == true).map(_._2).mkString("-"))
df.withColumn("col_new", concatColNames(array(df.columns.map(col): _*))).show(false)
If the columns are all StringTypes
then you just need to modify the udf
function as below
def concatColNames = udf((array: collection.mutable.WrappedArray[String]) => array.zip(columnNames).filter(x => x._1 == "True").map(_._2).mkString("-"))
You should get what you require
Upvotes: 3