Reputation: 1591
I have a dataframe like below:
Id priority1 priority2 priority3
1 true true true
2 false true true
3 false false false
I need to create a new dataframe, where if first priority is true (order of priority priority1, priority2, priority3), I need to make all other priorities as false. Looking for a dynamic solution where I can add lot more priority columns.
Expected output from the example above:
Id priority1 priority2 priority3 new_priority1 new_priority2 new_priority3
1 true true true true false false
2 false true true false true false
3 false false true false false true
Upvotes: 1
Views: 445
Reputation: 28322
First, put all the priority columns in an array and convert it to contain a single true value using an UDF
. Then to get the array values into their own columns, use a foldLeft
. Using the example input:
val df = Seq((1, true, true, true), (2, false, true, true), (3, false, false, false))
.toDF("Id", "priority1", "priority2", "priority3")
The UDF
and it's usage:
val convertPriorities = udf((prios: Seq[Boolean]) => {
val falseSeq = Seq.fill(prios.length)(false)
prios.indexOf(true) match {
case -1 => falseSeq
case x => falseSeq.updated(x, true)
}
})
val prioColumns = Seq("priority1", "priority2", "priority3")
val df2 = df.withColumn("priorities", convertPriorities(array(prioColumns.map(col(_)):_*)))
Note that a prioColumns
variable is created to make the foldLeft
easier.
val df3 = prioColumns.zipWithIndex
.foldLeft(df2)((df, col) => df.withColumn("new_" + col._1, $"priorities"(col._2)))
.drop("priorities")
This will give the following resulting dataframe:
+---+---------+---------+---------+-------------+-------------+-------------+
| Id|priority1|priority2|priority3|new_priority1|new_priority2|new_priority3|
+---+---------+---------+---------+-------------+-------------+-------------+
| 1| true| true| true| true| false| false|
| 2| false| true| true| false| true| false|
| 3| false| false| false| false| false| false|
+---+---------+---------+---------+-------------+-------------+-------------+
Using this approach it should be very easy to extend to use more columns, the only change that needs to be done is to add the columns to the prioColumns
variable. This could be done by getting all column name and applying a filter if (as in the example above) the columns have similar names.
Upvotes: 2