Chronicle queue POC returned unexpected latency

Question

One of our system has a micro service architecture using Apache Kafka as a service bus. Low latency is a very important factor but reliability and consistency (exactly once) are even more important.

When we perform some load tests we noticed signifiant performance degradation and all investigations pointed to big increases in Kafka topics producer and consumer latencies. No matter how much configuration we changed or more resources we added we could not get rid of the symptoms.

At the moment our needs are processing 10 transactions per second (TPS) and the load test is exercising 20 TPS but as the system is evolving and adding more functionality we know we'll reach a stage when the need will be 500TPS so we started being worried if we can achieve this with Kafka.

As a proof of concept I tried to switch to one of our micro services to use a chronicle-queue instead of a Kafka topic. It was easy to migrate following the avro example as from Chronicle-Queue-Demo git hub repo

public class MessageAppender {
    private static final String MESSAGES = "/tmp/messages";

    private final AvroHelper avroHelper;
    private final ExcerptAppender messageAppender;

    public MessageAppender() {
        avroHelper = new AvroHelper();
        messageAppender = SingleChronicleQueueBuilder.binary(MESSAGES).build().acquireAppender();
    }

    @SneakyThrows
    public long append(Message message) {
        try (var documentContext = messageAppender.writingDocument()) {
            var paymentRecord = avroHelper.getGenericRecord();
            paymentRecord.put("id", message.getId());
            paymentRecord.put("workflow", message.getWorkflow());
            paymentRecord.put("workflowStep", message.getWorkflowStep());
            paymentRecord.put("securityClaims", message.getSecurityClaims());
            paymentRecord.put("payload", message.getPayload());
            paymentRecord.put("headers", message.getHeaders());
            paymentRecord.put("status", message.getStatus());
            avroHelper.writeToOutputStream(paymentRecord, documentContext.wire().bytes().outputStream());
            return messageAppender.lastIndexAppended();
        }
    }
}

After configuring that appender we ran a loop to produce 100_000 messages to a chronicle queue. Every message has the same size and the final size of the file was 621MB. It took 22 minutes 20 seconds and 613 milliseconds (~1341seconds) to process write all messages so an average of about 75 message/second.

This was definitely not what we hopped for and so far from latencies advertised in the chronicle documentation that made me believe my approach was not the correct one. I admit that our messages are not small at about 6.36KB/message but i have no doubts storing them in a database would be faster so I still think I am not doing it right.

It is important our messages are process one by one.

Thank you in advance for your inputs and or suggestions.

DarcyThomas · Accepted Answer

Hand building the Avro object each time seems a bit of a code smell to me.

Can you create a predefined message -> avro serializer and use that to feed the queue?

Or, just for testing, create one avro object outside the loop and feed that one object into the queue many times. That way you can see if it is the building or the queuing which is the bottleneck.

More general advice:

Maybe attach a profiler and see if you are making an excessive amount of object allocations. Which is particularly bad if they are getting promoted to higher generations.

See if they are your objects or Chronicle Queue ones.

Is your code maxing out your ram or cpu (or network)?

Chronicle queue POC returned unexpected latency

Answers (1)

Related Questions