Reputation: 710
In the following code, I expected the compiler to identify that the output
gets defined either in the if
section or in the else
section.
val df1 = spark.createDataFrame(Seq(
(1, 10),
(2, 20)
)).toDF("A", "B")
val df2 = spark.emptyDataFrame
if(df2.isEmpty){
val output = df1
}
else{
val output = df2
}
println(output.show)
However, it gives me an error saying error: not found: value output
. if I do the same exact implementation in python it works fine and I get the expected output. In order to make this work in spark using scala I have defined output
as a mutable variable and update it inside the if-else
.
var output = spark.emptyDataFrame
if(df2.isEmpty){
output = df1
}
else{
output = df2
}
println(output.show)
Why doesn't the first implementation work and is there a way to get the expected outcome without using a mutable variable?
Upvotes: 2
Views: 943
Reputation: 22850
I suspect you come from a Python background where this kind of behavior is allowed.
In Scala this is not possible to achieve as is, because the if / else
structure creates a new block, and what is defined in a block only resides in such block.
You may fix this by using a mutable variable...
var output: DataFrame = _
if(df2.isEmpty){
output = df1
}
else{
output = df2
}
However, this is very Java and goes against the immutable principle.
In Scala, a block is an expression, and as such, they can return values.
Thus, this is the more idiomatic way to solve the problem in Scala.
val output = if(df2.isEmpty) df1 else df2
Upvotes: 7