Ayan Chatterjee
Ayan Chatterjee

Reputation: 41

org.apache.spark.sql.AnalysisException:

 df.withColumn(x, when($"x" > 75, $"x" + 10).otherwise($"x")).show()
org.apache.spark.sql.AnalysisException: cannot resolve '`x`' given input columns: [Name, Subject, Marks];;
'Project [Name#7, Subject#8, CASE WHEN ('x > 75) THEN ('x + 10) ELSE 'x END AS Marks#38]

scala> df.show()
+----+-------+-----+
|Name|Subject|Marks|
+----+-------+-----+
| Ram|Physics|   80|
|Sham|English|   90|
|Ayan|   Math|   70|
+----+-------+-----+


scala> x
res6: String = Marks

I want to pass a variable as a parameter and which stores a column value of the dataframe. And based on that parameter it will check condition, calculate value and replace a column with a same name in that dataframe.

Actually the bigger problem is, there is multiple columns in same name like "col1","col2","col3".... I will store this columns in a array and iterate over the array by passing the value of the array in dataframe operation. But as of now. Please let me know the solution of the problem if it can be handeled in spark-scala.

Upvotes: 0

Views: 1054

Answers (3)

s.polam
s.polam

Reputation: 10382

For better understanding I have separated columns as requiredColumns & allColumns.

Check below code.

scala> df.show(false)
+----+-------+-----+
|Name|Subject|Marks|
+----+-------+-----+
|Ram |Physics|80   |
|Sham|English|90   |
|Ayan|Math   |70   |
+----+-------+-----+
scala> val requiredColumns = Set("Marks")
requiredColumns: scala.collection.immutable.Set[String] = Set(Marks)
scala> val allColumns = df.columns
allColumns: Array[String] = Array(Name, Subject, Marks)
scala> 
val columnExpr =  allColumns
                    .filterNot(requiredColumn(_))
                    .map(col(_)) ++ requiredColumns
                    .map(c => when(col(c) > 75,col(c) + 10).otherwise(col(c)).as(c))

Output

scala> df.select(columnExpr:_*).show(false)
+----+-------+-----+
|Name|Subject|Marks|
+----+-------+-----+
|Ram |Physics|90   |
|Sham|English|100  |
|Ayan|Math   |70   |
+----+-------+-----+

Upvotes: 0

notNull
notNull

Reputation: 31540

Try with String interpolation col({s"${x}"}).

Example:

val df=Seq(("Ram","Physics",80),("Sham","English",90),("Ayan","Math",70)).toDF("Name","Subject","Marks")

df.show()
//+----+-------+-----+
//|Name|Subject|Marks|
//+----+-------+-----+
//| Ram|Physics|   80|
//|Sham|English|   90|
//|Ayan|   Math|   70|
//+----+-------+-----+

import org.apache.spark.sql.functions._
val x:String = "Marks"

df.withColumn(x, when(col(s"${x}") > 75, col(s"${x}") + 10).otherwise(col(s"${x}"))).show()
//+----+-------+-----+
//|Name|Subject|Marks|
//+----+-------+-----+
//| Ram|Physics|   90|
//|Sham|English|  100|
//|Ayan|   Math|   70|
//+----+-------+-----+

Upvotes: 2

Som
Som

Reputation: 6338

Use functions.col as below-

 df1.show(false)

    /**
      * +----+-------+-----+
      * |Name|Subject|Marks|
      * +----+-------+-----+
      * |Ram |Physics|80   |
      * |Sham|English|90   |
      * |Ayan|Math   |70   |
      * +----+-------+-----+
*/
val x = "Marks"
    // use functions.col
    df1.withColumn(x, when(col(x) > 75, col(x) + 10).otherwise(col(x)))
      .show()

    /**
      * +----+-------+-----+
      * |Name|Subject|Marks|
      * +----+-------+-----+
      * | Ram|Physics|   90|
      * |Sham|English|  100|
      * |Ayan|   Math|   70|
      * +----+-------+-----+
      */

Upvotes: 0

Related Questions