AnonymousAlias
AnonymousAlias

Reputation: 1399

Is there a way to determine where messages came from in a Kafka topic?

There is large amounts of data being pushed into one of our Kafka topics, is there a way to determine which producer this data is coming from?

Upvotes: 7

Views: 5907

Answers (4)

wh.pei
wh.pei

Reputation: 1

  1. use tcpdump on kafka broker to get packets send to 9092
  2. use wireshark to filter specific kafka topic (kafka.topic.name == "xxx")
  3. check the message body to get correct packet then find out source

Upvotes: 0

Dibyajyoti Ghosal
Dibyajyoti Ghosal

Reputation: 41

You can use headers, and hardcode producer id in the header before producing! That's something I'm doing in Node.js using rdkafka, java also should have it!

Upvotes: 2

OneCricketeer
OneCricketeer

Reputation: 191748

Without SASL or Authorizer level auditing, no there is not an easy way other than tracking down connected, suspicious client-id via JMX.

I would suggest you enforce a standard message format and spread the word to producer teams. For example, look at the Cloudevents spec, which includes a source field

https://github.com/cloudevents/spec/blob/master/kafka-protocol-binding.md

Upvotes: 5

mazaneicha
mazaneicha

Reputation: 9427

You can enable quotas for the clients/users, and then monitor which clients get throttled via two quota-related JMX MBeans - bandwidth and request rate:

Metric: Bandwidth quota metrics per (user, client-id), user or client-id
MBean: kafka.server:type={Produce|Fetch},user=([-.\w]+),client-id=([-.\w]+)
What it shows:: Two attributes. throttle-time indicates the amount of time in ms the client was throttled. Ideally = 0. byte-rate indicates the data produce/consume rate of the client in bytes/sec. For (user, client-id) quotas, both user and client-id are specified. If per-client-id quota is applied to the client, user is not specified. If per-user quota is applied, client-id is not specified.

Metric: Request quota metrics per (user, client-id), user or client-id
MBean: kafka.server:type=Request,user=([-.\w]+),client-id=([-.\w]+)
What it shows: Two attributes. throttle-time indicates the amount of time in ms the client was throttled. Ideally = 0. request-time indicates the percentage of time spent in broker network and I/O threads to process requests from client group. For (user, client-id) quotas, both user and client-id are specified. If per-client-id quota is applied to the client, user is not specified. If per-user quota is applied, client-id is not specified.

Upvotes: 4

Related Questions