codiction
codiction

Reputation: 621

Flink reduce on a keyed stream without window

I read the source code , reduce will forword every result to downstream.


I want reduce a stream by key without window,

    stream.keyBy(key)
          .reduce((a, b) -> {
                //reduce
                return a+b;
          });

if reduce on window, flink will forword element to downstream when watermark arrived, so how flink determine reduce finish without window.

Upvotes: 5

Views: 2717

Answers (3)

henry zhu
henry zhu

Reputation: 651

If you do reduce without window, Flink will emit a partial aggregated record after each element the reduce operator encountered. It's very dangerous for the performance, and in most cases it's not what you want.

Here is a quick Scala example to show the problem:

package org.example

import org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment, createTypeInformation}

object TestReduce {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment

    val stream = env.fromCollection(Seq(
      ("k1", 1),
      ("k1", 1),
      ("k1", 1),
      ("k1", 1),
      ("k1", 1),
      ("k1", 1))
    )

    stream
      .keyBy(_._1)
      .reduce((p, q) => (p._1, p._2 + q._2))
      .print()

    env.execute("test reduce")
  }
}

Running the code above will output:

1> (k1,1)
1> (k1,2)
1> (k1,3)
1> (k1,4)
1> (k1,5)
1> (k1,6)

Upvotes: 0

David Anderson
David Anderson

Reputation: 43439

With stream processing, in general there isn't the idea that computations "finish". They just keep going indefinitely. The non-windowed reduce just keeps on reducing for as long as you leave the job running.

Upvotes: 2

Ezequiel
Ezequiel

Reputation: 3592

According to the official documentation https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/

Reduce KeyedStream → DataStream

A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value.

Window Reduce WindowedStream → DataStream

Applies a functional reduce function to the window and returns the reduced value.

The key difference is that:

  • When reduce is done in a Window, the function combines the current value with the window one.
  • When reduce is done in a KeyedStream, the function combines the current value with the latest one.

Upvotes: 3

Related Questions