Reputation: 1386
Assume the following dataframe
val df = spark.read
.option("inferSchema", "true")
.option("header", "true")
.option("ignoreTrailingWhiteSpace", "true")
.csv(spark.sparkContext.parallelize(
"""key,pct1,pct2,pct3,factor
a,.1,.2,.3,5
b,.1,.2,.3,5"""
.split("\n")).toDS)
df.show
+--------+----+----+----+------+
| key|pct1|pct2|pct3|factor|
+--------+----+----+----+------+
| a| 0.1| 0.2| 0.3| 5|
| b| 0.1| 0.2| 0.3| 5|
+--------+----+----+----+------+
The following works just fine
df.withColumn("New", df.columns.filter(_.contains("pct")).map(col)
.reduceLeft((cur, next) => (next - cur) / col("factor"))).show
+--------+----+----+----+------+--------------------+
| key|pct1|pct2|pct3|factor| New|
+--------+----+----+----+------+--------------------+
| a| 0.1| 0.2| 0.3| 5|0.055999999999999994|
| b| 0.1| 0.2| 0.3| 5|0.055999999999999994|
+--------+----+----+----+------+--------------------+
But I cannot get the factor column to work with the power function.
df.withColumn("New", df.columns.filter(_.contains("pct")).map(col)
.reduceLeft((cur, next) => (next - cur) / scala.math.pow(col("factor"),2))).show
error: type mismatch;
found : org.apache.spark.sql.Column
required: Double
.reduceLeft((cur, next) => (next - cur) / scala.math.pow(col("factor"),2))).show
How come I can retrieve col("factor") in the first example but not when I apply the power function?
Upvotes: 0
Views: 325
Reputation: 10382
Change scala.math.pow
to import org.apache.spark.sql.functions.pow
. It will work.
Check below code.
scala> df.withColumn("New", df.columns.filter(_.contains("pct")).map(col(_)).reduceLeft((cur, next) => (next - cur) / pow(col("factor"),2))).show(false)
+---+-----+-----+-----+------+-------+
|key|pct1 |pct2 |pct3 |factor|New |
+---+-----+-----+-----+------+-------+
|a |0.1 |0.2 |0.3 |5 |0.01184|
|b |0.1 |0.2 |0.3 |5 |0.01184|
+---+-----+-----+-----+------+-------+
Upvotes: 3