Naman Agarwal
Naman Agarwal

Reputation: 654

How to perform multiple time Window operation on Streaming DataFrame?

I have 3 columns in DataFrame:- [time:TimeStamp,col1:Double,col2:Double] I want to perform following operation:

dataFrame.withWatermark("time", "10 seconds")
         .groupBy(window(col("time"),"10 seconds","1 second"))
         .agg(mean("col1") with window of 10 seconds,max("col") with window of 5 seconds)` 

Upvotes: 1

Views: 888

Answers (2)

Shashi
Shashi

Reputation: 1

Where dynamic rules which contains multiple aggregations (Avg, Max,..etc. Spark batch Supported) can not be applied on Spark Structured Streaming until 2.2. Even breaking the queries and joining them also Spark considers it as multiple aggregation and throws the exception.

Example from Logical Plan : Aggr1: Aggregate [EventTime#29, CategoryName#15], [EventTime#29, CategoryName#15, sum(ItemValue#10) AS sum(ItemValue)#64]

Aggr2: Aggregate [EventTime#84, CategoryName#105], [EventTime#84, CategoryName#105, avg(ItemValue#100) AS avg(ItemValue)#78]

org.apache.spark.sql.AnalysisException: Multiple streaming aggregations are not supported with streaming DataFrames/Datasets;;

Upvotes: -1

Tathagata Das
Tathagata Das

Reputation: 1808

Multiple aggregations on different sets of keys (different window = different grouping keys) in a single streaming query is not yet supported. You would have to run 2 different queries.

Upvotes: 2

Related Questions