naumcho
naumcho

Reputation: 19881

Status of kinesis stream reader

How do I tell what percentage of the data in a Kinesis stream a reader has already processed? I know each reader has a per-shard checkpoint sequence number, and I can also get the StartingSequenceNumber of each shard from describe-stream, however, I don't know how far along in my data the reader currently is (I don't know the latest sequence number of the shard).

I was thinking of getting a LATEST iterator for each shard and getting the last record's sequence number, however that doesn't seem to work if there's no new data since I got the LATEST iterator.

Any ideas or tools for doing this out there?

Thanks!

Upvotes: 4

Views: 1032

Answers (2)

isaac.hazan
isaac.hazan

Reputation: 3864

If you use KCL you can do that by comparing IncomingRecords from the cloudwatch built-in metrics of Kinesis with RecordsProcessed which is a custom metric published by the KCL.

Then you select a time range and interval of say 1 day.

You would then get the following type of graphs:

enter image description here

As you can see there were much more records added than processed. By looking at the values in each point you will know exactly if your processor is behind or not.

Upvotes: 1

Chris Riddell
Chris Riddell

Reputation: 1024

I suggest you implement a custom metric or metrics in your applications to track this.

For example, you could append a message send time within your Kinesis message, and on processing the message, record the time difference as an AWS CloudWatch custom metric. This would indicate how close your consumer is to the front of the stream.

You could also record the number of messages pushed (at the pushing application) and messages received at the Kinesis consumer. If you compare these in a chart on CloudWatch, you could see that the curves roughly follow each other indicating that the consumer is doing a good job at keeping up with the workload.

You could also try monitoring your Kinesis consumer, to see how often it idly waits for records (i.e, no results are returned by Kinesis, suggesting it is at the front of the stream and all records are processed)

Also note there is not a way to track a "percent" processed in the stream, since Kinesis messages expire after 24 hours (so the total number of messages is constantly rolling). There is also not a direct (API) function to count the number of messages inside your stream (unless you have recorded this as above).

Upvotes: 1

Related Questions