lai yoke hman
lai yoke hman

Reputation: 271

Multiple KCL application with same application name reading from one Kinesis Stream

I'm confused on how KCL works. First of all these are my understanding now.

If i create multiple, let's say 3, KCL application with different application name, then they are basically different application reading from the same stream, isolate from each other by having separate dynamodb tables. All 3 of them will read all x number of shards in the stream and keep track of the checkpoints separately.

Based on a few docs that i read, for example: https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-scaling.html

I would assume if i create another KCL application with the same application name, there would be 2 KCL application working on the same stream, with shards being load balanced to 2 workers in the 2 apps.

So, technically i can create 8 KCL app(let says there are 8 shards in the stream) in 8 ec2 instances, and each of them will process exactly one shard without clash, since each of them checkpoint in its own row in the dynamodb table.

I thought that is the case, but this post suggest otherwise: Multiple different consumers of same Kinesis stream

Else how can i achieve this

All workers associated with this application name are assumed to be working together on the same stream. These workers may be distributed on multiple instances. If you run an additional instance of the same application code, but with a different application name, the KCL treats the second instance as an entirely separate application that is also operating on the same stream.

as mentioned here https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-implementation-app-java.html#kinesis-record-processor-initialization-java

Reference:

https://www.amazonaws.cn/en/kinesis/data-streams/faqs/#recordprocessor https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-scaling.html https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-implementation-app-java.html#kinesis-record-processor-initialization-java

Upvotes: 2

Views: 1476

Answers (1)

r0ckj
r0ckj

Reputation: 41

KCL library needs ConfigsBuilder where you pass streamName, applicationName, kinesisAsyncClient etc. Here, if you specify an application name associated with stream name, then

DynamoDB table with the application name and uses the table to maintain state information

So if you have multiple streams, then you create multiple software.amazon.kinesis.common.ConfigsBuilder with individual streamNames and its associated applicationNames. Pass individual configBuilder properties to software.amazon.kinesis.coordinator.Scheduler

This way you will have a dynamodb for every single streams. And your multi instance app can consume each stream event only once.

Upvotes: 1

Related Questions