Spark - How to get value, not column itself?

Question

Assume the following dataframe

val df = spark.read
  .option("inferSchema", "true")
  .option("header", "true")
  .option("ignoreTrailingWhiteSpace", "true")
  .csv(spark.sparkContext.parallelize(
    """key,pct1,pct2,pct3,factor
       a,.1,.2,.3,5
       b,.1,.2,.3,5"""
    .split("
")).toDS)

df.show

+--------+----+----+----+------+
|     key|pct1|pct2|pct3|factor|
+--------+----+----+----+------+
|       a| 0.1| 0.2| 0.3|     5|
|       b| 0.1| 0.2| 0.3|     5|
+--------+----+----+----+------+

The following works just fine

df.withColumn("New", df.columns.filter(_.contains("pct")).map(col)
  .reduceLeft((cur, next) => (next - cur) / col("factor"))).show

+--------+----+----+----+------+--------------------+
|     key|pct1|pct2|pct3|factor|                 New|
+--------+----+----+----+------+--------------------+
|       a| 0.1| 0.2| 0.3|     5|0.055999999999999994|
|       b| 0.1| 0.2| 0.3|     5|0.055999999999999994|
+--------+----+----+----+------+--------------------+

But I cannot get the factor column to work with the power function.

df.withColumn("New", df.columns.filter(_.contains("pct")).map(col)
  .reduceLeft((cur, next) => (next - cur) / scala.math.pow(col("factor"),2))).show

error: type mismatch;
 found   : org.apache.spark.sql.Column
 required: Double
         .reduceLeft((cur, next) => (next - cur) / scala.math.pow(col("factor"),2))).show

How come I can retrieve col("factor") in the first example but not when I apply the power function?

s.polam · Accepted Answer

Change scala.math.pow to import org.apache.spark.sql.functions.pow. It will work.

Check below code.

scala> df.withColumn("New", df.columns.filter(_.contains("pct")).map(col(_)).reduceLeft((cur, next) => (next - cur) / pow(col("factor"),2))).show(false)
+---+-----+-----+-----+------+-------+
|key|pct1 |pct2 |pct3 |factor|New    |
+---+-----+-----+-----+------+-------+
|a  |0.1  |0.2  |0.3  |5     |0.01184|
|b  |0.1  |0.2  |0.3  |5     |0.01184|
+---+-----+-----+-----+------+-------+

Spark - How to get value, not column itself?

Answers (1)

Related Questions