vasu seth
vasu seth

Reputation: 37

How to extract column names which have a specific value

I have a list of column names which contains column names , I am iterating a row and checking if it contains 1 then appending that column name to a list.Can It be better using zip or something.

private def getNames(row: Row): List[String] = {
 val listofColumnNames = row.schema.fields.map(_.name).toList
    val listOfColumnWhichContainOne =  ArrayBuffer[String]()
    listofColumnNames.indices.foreach(index => {
      if(row.getInt(index).equals(1)) {
        listOfColumnWhichContainOne.append(listofColumnNames(index))
      }
    })
    listofColumnNames.toList
}

Can It be simplified ?

Upvotes: 1

Views: 192

Answers (1)

werner
werner

Reputation: 14875

You can add a new column to an existing dataframe that contains a list of all columns in which for that particular row the field has the value 1.

Within the column paramater of withColumn you can iterate over all other columns and check for the wanted value:

val df = Seq((1, 2, 3), (4, 5, 6), (3, 2, 1)).toDF("col1", "col2", "col3")
df.show()

val cols = df.schema.fieldNames //change this array according to your needs
                                //if you want to exclude columns from the check

df.withColumn("result", array(
   cols.map {
       c: String => when(col(c).equalTo(1), c)
       }: _*
)).show()

prints:

//input data
+----+----+----+
|col1|col2|col3|
+----+----+----+
|   1|   2|   3|
|   4|   5|   6|
|   3|   1|   1|
+----+----+----+

//result
+----+----+----+--------------+
|col1|col2|col3|        result|
+----+----+----+--------------+
|   1|   2|   3|      [col1,,]|
|   4|   5|   6|          [,,]|
|   3|   1|   1|[, col2, col3]|
+----+----+----+--------------+

Upvotes: 1

Related Questions