Reputation: 347
I want to do something like this:
df
.withColumn("newCol", <some formula>)
.filter(s"""newCol > ${(math.min(max("newCol").asInstanceOf[Double],10))}""")
Exception I'm getting:
org.apache.spark.sql.Column cannot be cast to java.lang.Double
Can you please suggest me a way to achieve what i want?
Upvotes: 1
Views: 12598
Reputation: 211
The solution is two parts,
Part I
Find the maximum value,
df.select(max($"col1")).first()(0)
Part II
Use that value to filter on it
df.filter($"col1" === df.select(max($"col1")).first()(0)).show
Bonus
To avoid potential errors, you can also get the maximum value in a specific format you need, using the .get
family on it df.select(max($"col1")).first.getDouble(0)
In this case col1
is DoubleType, so I chose to pick it in the correct format. You can get pretty much all other types. Options are:
getBoolean, getClass, getDecimal, getFloat, getJavaMap, getLong, getSeq, getString, getTimestamp, getAs, getByte, getDate, getDouble, getInt, getList, getMap, getShort, getStruct, getValuesMap
Making the full solution in this case
df.filter($"col1" === df.select(max($"col1")).first.getDouble(0)).show
Upvotes: 0
Reputation: 27373
I assume newCol
is already present in df
, then:
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._
df
.withColumn("max_newCol",max($"newCol").over(Window.partitionBy()))
.filter($"newCol"> least($"max_newCol",lit(10.0)))
Instead of max($"newCol").over(Window.partitionBy())
you can also jjst write max($"newCol").over()
Upvotes: 5
Reputation: 35229
I'd separate both steps and either:
val newDF = df
.withColumn("newCol", <some formula>)
// Spark 2.1 or later
// With 1.x use join
newDf.alias("l").crossJoin(
newDf.alias("r")).where($"l.newCol" > least($"r.newCol", lit(10.0)))
or
newDf.where(
$"newCol" > (newDf.select(max($"newCol")).as[Double].first min 10.0))
Upvotes: 0
Reputation: 176
I think dataframe describe function is what you are looking for.
ds.describe("age", "height").show()
// output:
// summary age height
// count 10.0 10.0
// mean 53.3 178.05
// stddev 11.6 15.7
// min 18.0 163.0
// max 92.0 192.0
Upvotes: 0