Markus
Markus

Reputation: 3782

Type mismatch: expected String, actual Column

I am making some transformation with my DataFrame using Spark 2.2.0 and Scala 2.11.

The problem occurs with this line of code Math.abs($"right.product_price".asInstanceOf[Double] - $"left.product_price".asInstanceOf[Double]). I want to calculate the absolute difference between left.product_price and right.product_price. If any of these columns contains null, then nullis converted to 0.

However, I get an error: "Type mismatch: expected String, actual Column". How can I do this calculation in a correct way?

val result = df.as("left")
    // self-join by gender:
    .join(df.as("right"), ($"left.gender" === $"right.gender")
    // limit to 10 results per record:
    .withColumn("rn", row_number().over(Window.partitionBy($"left.product_PK").orderBy($"right.product_PK")))
    .filter($"rn <= 10").drop($"rn")
    // group and collect_list to create products column:
    .groupBy($"left.product_PK" as "product_PK")
    .agg(collect_list(struct($"right.product_PK", Math.abs($"right.product_price".asInstanceOf[Double] - $"right.product_price".asInstanceOf[Double]))) as "products")

Upvotes: 2

Views: 2842

Answers (1)

user8967692
user8967692

Reputation: 26

You cannot use Math.abs and you cannot use asinstanceOf. Use SQL functions.abs and cast:

import org.apache.spark.sql.functions.abs

...
  .agg(collect_list(struct(
     $"right.product_PK",
    abs($"right.product_price".cast("double)" - $"right.product_price".cast("double"))
  )) as "products")

To convert null to 0 add coalesce:

import org.apache.spark.sql.functions.{coalesce, lit}

coalesce(column, lit(0))

Upvotes: 1

Related Questions