Reputation: 37
I have a list of column names which contains column names , I am iterating a row and checking if it contains 1 then appending that column name to a list.Can It be better using zip or something.
private def getNames(row: Row): List[String] = {
val listofColumnNames = row.schema.fields.map(_.name).toList
val listOfColumnWhichContainOne = ArrayBuffer[String]()
listofColumnNames.indices.foreach(index => {
if(row.getInt(index).equals(1)) {
listOfColumnWhichContainOne.append(listofColumnNames(index))
}
})
listofColumnNames.toList
}
Can It be simplified ?
Upvotes: 1
Views: 192
Reputation: 14875
You can add a new column to an existing dataframe that contains a list of all columns in which for that particular row the field has the value 1
.
Within the column paramater of withColumn you can iterate over all other columns and check for the wanted value:
val df = Seq((1, 2, 3), (4, 5, 6), (3, 2, 1)).toDF("col1", "col2", "col3")
df.show()
val cols = df.schema.fieldNames //change this array according to your needs
//if you want to exclude columns from the check
df.withColumn("result", array(
cols.map {
c: String => when(col(c).equalTo(1), c)
}: _*
)).show()
prints:
//input data
+----+----+----+
|col1|col2|col3|
+----+----+----+
| 1| 2| 3|
| 4| 5| 6|
| 3| 1| 1|
+----+----+----+
//result
+----+----+----+--------------+
|col1|col2|col3| result|
+----+----+----+--------------+
| 1| 2| 3| [col1,,]|
| 4| 5| 6| [,,]|
| 3| 1| 1|[, col2, col3]|
+----+----+----+--------------+
Upvotes: 1