Reputation: 41
df.withColumn(x, when($"x" > 75, $"x" + 10).otherwise($"x")).show()
org.apache.spark.sql.AnalysisException: cannot resolve '`x`' given input columns: [Name, Subject, Marks];;
'Project [Name#7, Subject#8, CASE WHEN ('x > 75) THEN ('x + 10) ELSE 'x END AS Marks#38]
scala> df.show()
+----+-------+-----+
|Name|Subject|Marks|
+----+-------+-----+
| Ram|Physics| 80|
|Sham|English| 90|
|Ayan| Math| 70|
+----+-------+-----+
scala> x
res6: String = Marks
I want to pass a variable as a parameter and which stores a column value of the dataframe. And based on that parameter it will check condition, calculate value and replace a column with a same name in that dataframe.
Actually the bigger problem is, there is multiple columns in same name like "col1","col2","col3".... I will store this columns in a array and iterate over the array by passing the value of the array in dataframe operation. But as of now. Please let me know the solution of the problem if it can be handeled in spark-scala.
Upvotes: 0
Views: 1054
Reputation: 10382
For better understanding I have separated columns as requiredColumns
& allColumns
.
Check below code.
scala> df.show(false)
+----+-------+-----+
|Name|Subject|Marks|
+----+-------+-----+
|Ram |Physics|80 |
|Sham|English|90 |
|Ayan|Math |70 |
+----+-------+-----+
scala> val requiredColumns = Set("Marks")
requiredColumns: scala.collection.immutable.Set[String] = Set(Marks)
scala> val allColumns = df.columns
allColumns: Array[String] = Array(Name, Subject, Marks)
scala>
val columnExpr = allColumns
.filterNot(requiredColumn(_))
.map(col(_)) ++ requiredColumns
.map(c => when(col(c) > 75,col(c) + 10).otherwise(col(c)).as(c))
Output
scala> df.select(columnExpr:_*).show(false)
+----+-------+-----+
|Name|Subject|Marks|
+----+-------+-----+
|Ram |Physics|90 |
|Sham|English|100 |
|Ayan|Math |70 |
+----+-------+-----+
Upvotes: 0
Reputation: 31540
Try with String interpolation col({s"${x}"})
.
Example:
val df=Seq(("Ram","Physics",80),("Sham","English",90),("Ayan","Math",70)).toDF("Name","Subject","Marks")
df.show()
//+----+-------+-----+
//|Name|Subject|Marks|
//+----+-------+-----+
//| Ram|Physics| 80|
//|Sham|English| 90|
//|Ayan| Math| 70|
//+----+-------+-----+
import org.apache.spark.sql.functions._
val x:String = "Marks"
df.withColumn(x, when(col(s"${x}") > 75, col(s"${x}") + 10).otherwise(col(s"${x}"))).show()
//+----+-------+-----+
//|Name|Subject|Marks|
//+----+-------+-----+
//| Ram|Physics| 90|
//|Sham|English| 100|
//|Ayan| Math| 70|
//+----+-------+-----+
Upvotes: 2
Reputation: 6338
Use functions.col
as below-
df1.show(false)
/**
* +----+-------+-----+
* |Name|Subject|Marks|
* +----+-------+-----+
* |Ram |Physics|80 |
* |Sham|English|90 |
* |Ayan|Math |70 |
* +----+-------+-----+
*/
val x = "Marks"
// use functions.col
df1.withColumn(x, when(col(x) > 75, col(x) + 10).otherwise(col(x)))
.show()
/**
* +----+-------+-----+
* |Name|Subject|Marks|
* +----+-------+-----+
* | Ram|Physics| 90|
* |Sham|English| 100|
* |Ayan| Math| 70|
* +----+-------+-----+
*/
Upvotes: 0