Reactormonk
Reactormonk

Reputation: 21690

How to call an UDF using Scala

How would I express the following code in Scala via the DataFrame API?

sqlContext.read.parquet("/input").registerTempTable("data")
sqlContext.udf.register("median", new Median)
sqlContext.sql(
  """
    |SELECT
    |  param,
    |  median(value) as median
    |FROM data
    |GROUP BY param
""".stripMargin).registerTempTable("medians")

I've started via

val data = sqlContext.read.parquet("/input")
sqlContext.udf.register("median", new Median)
data.groupBy("param")

But them I'm not sure how to call the median function.

Upvotes: 1

Views: 2381

Answers (1)

zero323
zero323

Reputation: 330063

You can either use callUDF

data.groupBy("param").agg(callUDF("median", $"value"))

or call it directly:

val median = new Median
data.groupBy("param").agg(median($"value"))

// Equivalent to
data.groupBy("param").agg(new Median()($"value"))

Still, I think it would make more sense to use an object not a class.

Upvotes: 1

Related Questions