Reputation: 3787
In designing a system that uses Kafka to separate/parallelise units of work I have found that I have 2 choices:
Data -> manipulate data -> store in DB -> send ID as message -> load data from DB using ID in message ->...
Data -> manipulate data -> send data as message -> load data from message ->...
The second option gets rid of all the side-effecting code saving and loading data in the DB, if I do this then my code is much nicer and my unit can sometimes become a pure function. I also put less load on the DB. The downside is that this message may be large, where messaging systems are usually designed to be fast with small messages.
The questions I have are:
Upvotes: 3
Views: 4486
Reputation: 222521
There is nothing wrong with big messages in kafka. One potential problem is that brokers and consumers have to decompress messages and therefore use their RAM. So if the size is big, it can impose pressure on RAM (but I am not sure what size can give you visible results).
Benchmarking page from LinkedIn has a good explanation on effect of message size. So I will just leave it here.
I have mostly shown performance on small 100 byte messages. Smaller messages are the harder problem for a messaging system as they magnify the overhead of the bookkeeping the system does. We can show this by just graphing throughput in both records/second and MB/second as we vary the record size.
So, as we would expect, this graph shows that the raw count of records we can send per second decreases as the records get bigger. But if we look at MB/second, we see that the total byte throughput of real user data increases as messages get bigger:
We can see that with the 10 byte messages we are actually CPU bound by just acquiring the lock and enqueuing the message for sending—we are not able to actually max out the network. However, starting with 100 bytes, we are actually seeing network saturation (though the MB/sec continues to increase as our fixed-size bookkeeping bytes become an increasingly small percentage of the total bytes sent).
Based on this, I would not worry too much about the size of your message and would just go ahead with your second and easier solution.
Upvotes: 5
Reputation: 8161
The message.max.bytes
property in the kafka broker config defines the maximum size of message that the server can receive. the default value is 1000000
The doc says
The maximum size of a message that the server can receive. It is important that this property be in sync with the maximum fetch size your consumers use or else an unruly producer will be able to publish messages too large for consumers to consume.
Upvotes: 2