Large Kafka messages vs small messages + DB

Question

In designing a system that uses Kafka to separate/parallelise units of work I have found that I have 2 choices:

Data -> manipulate data -> store in DB -> send ID as message -> load data from DB using ID in message ->...

Data -> manipulate data -> send data as message -> load data from message ->...

The second option gets rid of all the side-effecting code saving and loading data in the DB, if I do this then my code is much nicer and my unit can sometimes become a pure function. I also put less load on the DB. The downside is that this message may be large, where messaging systems are usually designed to be fast with small messages.

The questions I have are:

At what point (how many bytes) is a message starting to look a bit large for Kafka?
What other advantages and disadvantages are there to take into consideration?

Salvador Dali · Accepted Answer

There is nothing wrong with big messages in kafka. One potential problem is that brokers and consumers have to decompress messages and therefore use their RAM. So if the size is big, it can impose pressure on RAM (but I am not sure what size can give you visible results).

Benchmarking page from LinkedIn has a good explanation on effect of message size. So I will just leave it here.

I have mostly shown performance on small 100 byte messages. Smaller messages are the harder problem for a messaging system as they magnify the overhead of the bookkeeping the system does. We can show this by just graphing throughput in both records/second and MB/second as we vary the record size.

enter image description here

So, as we would expect, this graph shows that the raw count of records we can send per second decreases as the records get bigger. But if we look at MB/second, we see that the total byte throughput of real user data increases as messages get bigger:

enter image description here

We can see that with the 10 byte messages we are actually CPU bound by just acquiring the lock and enqueuing the message for sending—we are not able to actually max out the network. However, starting with 100 bytes, we are actually seeing network saturation (though the MB/sec continues to increase as our fixed-size bookkeeping bytes become an increasingly small percentage of the total bytes sent).

Based on this, I would not worry too much about the size of your message and would just go ahead with your second and easier solution.

Large Kafka messages vs small messages + DB

Answers (2)

Related Questions