sclee1
sclee1

Reputation: 1281

Extensive benchamark for Flink in stream processing

I am using Flink at my company and I am considering to apply several scenarios to see the performance of each case.

Below is the scenarios that I will work on

  1. Experiments

For the Exactly-At-Once, I will use the TwoPhaseCommitSink for achieving the case. Before doing experiment, I am wondering some issues as below.

  1. The performance speed of the sink

As you can see, I will use the mysql (RDB) for the sink. Is there any descriptive benchmarks result when we use the RDB for at-least-once or exactly-at-once? I think that when the sink uses the database, the throughput will be influenced because it takes some time to connect and communicate with database. But I cannot find any documents or technical blogs showing the detailed results of benchmark of Flink when using the Sink for RDB. Especially, I am also wondering that the Exactly-at-once will have more degraded performance than the at-least-once and it is hard to use the commercial purpose because of its slow processing. So my question is as below.

  1. Is there any informative results for the two semantics mode (at least once, exactly at once) using the database sink (mysql or redis)?

  2. Exactly-at-once semantics for end-to-end will be very slow when using the mysql sink? I will apply the twophasecommitsink.

Thanks.

Upvotes: 0

Views: 206

Answers (1)

David Anderson
David Anderson

Reputation: 43409

A few reactions:

  • Simple, generic Flink benchmarks are pretty useless as predictors of specific application performance. So much depends on what a specific job is doing, and there's a lot of room for optimization.
  • Exactly-once with two-phase commit sinks is costly in terms of latency, but not so bad with respect to throughput. The issue is that the commit has to be done in concert with a checkpoint. If you need to checkpoint frequently in order to reduce the latency, then that will more significantly harm the throughput.
  • Unaligned checkpoints and the changelog state backend can make a big difference for some use cases. If you want to include these in your testing, be sure to use Flink 1.16, which saw significant improvements in these areas.
  • The Flink project has invested quite a bit in having a suite of benchmarks that run on every commit. See https://github.com/apache/flink-benchmarks and http://codespeed.dak8s.net:8000/ for more info.

Upvotes: 1

Related Questions