Dario R
Dario R

Reputation: 63

Filebeat with 2 kafka outputs

I have an issue with Filebeat when I try to send data logs to 2 Kafka nodes at the same time

The following is the Output Kafka section of the filebeat.yml file:

output.kafka:
  enabled: true
  hosts: [ "192.168.xxx.xx:9092", "192.168.zzz.zz:9092" ]
  topic: "syslog"
  timeout: 30s
  max_message_bytes: 1000000

Both kafka services are running, but only the second node get data. I mean Only the Kafka node 192.168.zzz.zz get the data Filebeat have sent it.

If I exchange the IP adressess, occurs that the second IP address get the data log.

Why occurs that ? What other configurations are need to implement this use case? I need send the data to both kafka outputs.

Upvotes: 0

Views: 2415

Answers (2)

Alexandre Juma
Alexandre Juma

Reputation: 3323

As cricket_007 explained, that host array should contain only nodes from the same kafka cluster because they are used to bootstrap your connection to the cluster. The boostraping basically works by providing the address of one, some or all nodes of your cluster so the kafka producer can receive the blueprint (metadata) describing the kafka cluster.

Furthermore, when you say you can't see your message on one of the nodes, I get the feeling they are not part of the same kafka cluster. If you say you can't see your data because you try to consume from a topic on your "other" server, if they were part of the same cluster you'd be able to consume it anyway even if no partition (leader or replica) are present on that specific node.

When you consume, your consumer connects to ZK and gets the cluster metadata so it connects to the right nodes/partitions which allow you to consume from a kafka topic, so it doesn't actually depends on the machine where you run your consumer.

So this answer assumes that you actually want to produce your messages to different kafka clusters.

In this case, since filebeat doesn't support multiple output blocks of the same output type, you can use one the simplest solution I know for mirroring kafka-to-kafka: https://docs.confluent.io/current/connect/kafka-connect-replicator/index.html

By replicating, you can achieve the same result but instead having your filebeat sending to two kafka clusters, you send only to one and then mirror your topic to a second kafka cluster.

Upvotes: 2

OneCricketeer
OneCricketeer

Reputation: 191983

Assuming both broker urls form the same cluster, only one address is used to bootstrap the rest of the cluster. If one of those addresses is not reachable, then the other is selected.

If Filebeat is creating messages with null keys, then messages should be evenly spread across partitions within the specified kafka topic for the cluster that is being connected to.

Data is only sent to the leader for the calculated partition (based on the message key), therefore, a single message cannot be sent to "two nodes (of the same cluster) at the same time". Also, if you had more than those two servers in the Kafka cluster, the one that gets the data could one that is not part of the those addresses you've listed.

I don't think Filebeat can output to more than one unique Kafka cluster at once, at least least not within a single output.kafka section. Logstash might work better for that use case

Upvotes: 4

Related Questions