Reputation: 3782
I am making some transformation with my DataFrame using Spark 2.2.0 and Scala 2.11.
The problem occurs with this line of code Math.abs($"right.product_price".asInstanceOf[Double] - $"left.product_price".asInstanceOf[Double])
. I want to calculate the absolute difference between left.product_price
and right.product_price
. If any of these columns contains null
, then null
is converted to 0
.
However, I get an error: "Type mismatch: expected String, actual Column". How can I do this calculation in a correct way?
val result = df.as("left")
// self-join by gender:
.join(df.as("right"), ($"left.gender" === $"right.gender")
// limit to 10 results per record:
.withColumn("rn", row_number().over(Window.partitionBy($"left.product_PK").orderBy($"right.product_PK")))
.filter($"rn <= 10").drop($"rn")
// group and collect_list to create products column:
.groupBy($"left.product_PK" as "product_PK")
.agg(collect_list(struct($"right.product_PK", Math.abs($"right.product_price".asInstanceOf[Double] - $"right.product_price".asInstanceOf[Double]))) as "products")
Upvotes: 2
Views: 2842
Reputation: 26
You cannot use Math.abs
and you cannot use asinstanceOf
. Use SQL functions.abs
and cast
:
import org.apache.spark.sql.functions.abs
...
.agg(collect_list(struct(
$"right.product_PK",
abs($"right.product_price".cast("double)" - $"right.product_price".cast("double"))
)) as "products")
To convert null
to 0
add coalesce
:
import org.apache.spark.sql.functions.{coalesce, lit}
coalesce(column, lit(0))
Upvotes: 1