Reputation: 654
I have 3 columns in DataFrame:- [time:TimeStamp,col1:Double,col2:Double] I want to perform following operation:
dataFrame.withWatermark("time", "10 seconds")
.groupBy(window(col("time"),"10 seconds","1 second"))
.agg(mean("col1") with window of 10 seconds,max("col") with window of 5 seconds)`
Upvotes: 1
Views: 888
Reputation: 1
Where dynamic rules which contains multiple aggregations (Avg, Max,..etc. Spark batch Supported) can not be applied on Spark Structured Streaming until 2.2. Even breaking the queries and joining them also Spark considers it as multiple aggregation and throws the exception.
Example from Logical Plan : Aggr1: Aggregate [EventTime#29, CategoryName#15], [EventTime#29, CategoryName#15, sum(ItemValue#10) AS sum(ItemValue)#64]
Aggr2: Aggregate [EventTime#84, CategoryName#105], [EventTime#84, CategoryName#105, avg(ItemValue#100) AS avg(ItemValue)#78]
org.apache.spark.sql.AnalysisException: Multiple streaming aggregations are not supported with streaming DataFrames/Datasets;;
Upvotes: -1
Reputation: 1808
Multiple aggregations on different sets of keys (different window = different grouping keys) in a single streaming query is not yet supported. You would have to run 2 different queries.
Upvotes: 2