rakeeee
rakeeee

Reputation: 1073

kafka compression using high level consumer and simple consumer

In my application, we are using Kafka high level consumer which consumes the decompressed data without any issues if the producer and consumer compress and decompress the data using java API.

What happens if the producer uses librdkafka C++ API for compression(either snappy or GZIP)? does the java consumer able decompress transparently as it did in case of above mentioned. What happens with fetch size at the consumer side? does this also handled transparently.

What happens if the kafka consumer is designed by using simple consumer model? Does we have to decompress explicitly the compressed data coming from producer( Assume librdkafka C++ API used here).

I am thinking that highlevel consumer might not work in case of compression happens with librdkafka C++ API at the producer side? Please clear me if i am wrong here as i saw some other post here Kafka message codec - compress and decompress. As oppose to this i found an another link says decompression supposed to work if high level consumer used http://grokbase.com/t/kafka/users/142veppeyv/unable-to-consume-snappy-compressed-messages-with-simple-consumer.

Thanks

Upvotes: 2

Views: 3100

Answers (2)

Salvador Dali
Salvador Dali

Reputation: 222441

The main idea of all these distributed producers/brokers/consumers is to work with each other seamlessly and transparently. This mean that you should not know (and care):

  • how producers are implemented
  • what compression do they use (if any)
  • how many producers/brokers are there

Your consumer only need to listen to his topic/partition and know what to do with messages.

You can look at it as an analogy of the web: your browser do not care how SO is written, what server runs it, whether it uses gzip and so on. As long as both of them speak http - it will work.

Upvotes: 1

Edenhill
Edenhill

Reputation: 3113

They are compatible, librdkafka uses the same compression and framing as the Scala/Java client.

Increasing fetch.message.max.bytes allows the consumer to fetch larger messages, or larger batches of messages with each request, but it can usually be left to its default value unless your producers are producing messages larger than this value - in which case you will also need to increase message.max.bytes.

Compression is only configured on the producer, no configuration is required on the consumer side since each message (or batch of messages) are flagged with their compression type (none, snappy, gzip, ..).

Upvotes: 4

Related Questions