Reputation: 21690
How would I express the following code in Scala via the DataFrame API?
sqlContext.read.parquet("/input").registerTempTable("data")
sqlContext.udf.register("median", new Median)
sqlContext.sql(
"""
|SELECT
| param,
| median(value) as median
|FROM data
|GROUP BY param
""".stripMargin).registerTempTable("medians")
I've started via
val data = sqlContext.read.parquet("/input")
sqlContext.udf.register("median", new Median)
data.groupBy("param")
But them I'm not sure how to call the median
function.
Upvotes: 1
Views: 2381
Reputation: 330063
You can either use callUDF
data.groupBy("param").agg(callUDF("median", $"value"))
or call it directly:
val median = new Median
data.groupBy("param").agg(median($"value"))
// Equivalent to
data.groupBy("param").agg(new Median()($"value"))
Still, I think it would make more sense to use an object
not a class
.
Upvotes: 1