turtlemonvh
turtlemonvh

Reputation: 9759

How to use custom type-safe aggregator in Spark SQL

The Spark documentation describes how to create both a untyped user defined aggregate function (code) (aka udaf) and a strongly-typed aggregator (code) (aka a subclass of org.apache.spark.sql.expressions.Aggregator).

I know you can register a udaf for use in sql via spark.udf.register("udafName", udafInstance), then use it like spark.sql("SELECT udafName(V) as aggV FROM data").

Is there a way to use an aggregator in sql too?

Upvotes: 2

Views: 599

Answers (1)

user10010023
user10010023

Reputation: 26

Not really Aggregator API is designed specifically with "strongly" typed Datasets in mind. You'll notice, that it doesn't take Columns but always operates on whole record objects.

This doesn't really fit into SQL processing model:

  • In SQL you always operate on Dataset[Row]. Not much use for Aggregator.
  • Operations are applied on columns while Aggregator takes a complete Row.

For use with SQL API you can create UserDefinedAggregateFunction which can be registered using standard methods.

Upvotes: 1

Related Questions