Reputation: 9759
The Spark documentation describes how to create both a untyped user defined aggregate function (code) (aka udaf) and a strongly-typed aggregator (code) (aka a subclass of org.apache.spark.sql.expressions.Aggregator
).
I know you can register a udaf for use in sql via spark.udf.register("udafName", udafInstance)
, then use it like spark.sql("SELECT udafName(V) as aggV FROM data")
.
Is there a way to use an aggregator in sql too?
Upvotes: 2
Views: 599
Reputation: 26
Not really Aggregator
API is designed specifically with "strongly" typed Datasets
in mind. You'll notice, that it doesn't take Columns
but always operates on whole record objects.
This doesn't really fit into SQL processing model:
Dataset[Row]
. Not much use for Aggregator
.Aggregator
takes a complete Row
.For use with SQL API you can create UserDefinedAggregateFunction
which can be registered using standard methods.
Upvotes: 1