Reputation: 71
We are planning to implement Kafka to collect logs from all kind of devices. We expect to have around 10k of devices. Can we connect all these devices directly to a kafka cluster or should we funnel the logs through log servers to limit the number of connections to kafka? We plan to have one topic per kind of devices (Linux, Aix, Windows 2003, 2008 and so on) Thanks
Upvotes: 7
Views: 16459
Reputation: 357
I would say the relevant metric would be number of messages per second each Kaka node would need to deliver. It benchmarks very well, in the hundreds of thousands of messages per second per node, which scales linearly per node. If one or two nodes isn't enough you can always add more nodes to increase throughput.
A old benchmark with 3 nodes was doing 800k messages (~80 mb) per second, with each message being replicated to each other node.
You can read more in depth here: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
Edit: Kafka connections are tcp connections under the covers: https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-Network
Quote:
Kafka uses a binary protocol over TCP. The protocol
defines all apis as request response message pairs.
Tcp socket connections are pretty lightweight and limited only by the available memory of the server being connected to. Some Kafka scales linearly you should be able to scale out brokers and partion your topics for any load you anticipate
Upvotes: 3