user3497321
user3497321

Reputation: 533

Flink + connect dataStreams generating showing huge records in source

We're consuming data from a compacted kafka topic using upsert-kafka connector and converting the same to a DataStream using .toChangelogStream. We have another DataStream from source kafka created using kafka connector. Now, we are creating Keyed streams from both and connecting them using a KeyedCoProcessFunction implementation.

In Flink UI, we see huge number of records in the source kafka operator, even though we do not have those many records being published to that topic. When I avoid .connect I could see less records in the source operator.

enter image description here

The source stream has ~50K records and the compacted stream has ~200 records. In the KeyedCoProcessFunction we are looking for a common key and emitting the required records.

What am I missing here? Can someone please help on this?

Upvotes: 0

Views: 116

Answers (1)

user3497321
user3497321

Reputation: 533

There was no issue with the source or with the code. Flink UI has combined the operations and showed it as one operator. I updated the parallelism for few operations and then UI has shown the expected numbers.

Upvotes: 0

Related Questions