el323
el323

Reputation: 2920

How to consume large messages from kafka topic

Kafka Version 1.1.0

I have a single node kafka broker with the following configs in config/server.properties:

Broker Configs:

message.max.bytes=100000000
max.message.bytes=100000000
replica.fetch.max.bytes=150000000
log.segment.bytes=1073741824 (Default)

Console consumer properties file has the following configs:

Consumer Properties:

receive.buffer.bytes=100000000
max.partition.fetch.bytes=100000000
fetch.max.bytes=52428800

I am producing a message whose size is about 20KB. I produce to a topic using console producer. Then start a console consumer on the topic and it doesn't consumes the complete message (cut in between).

I have looked into this post and have tried to set the same settings but it doesn't seem to work out.

What am I missing here? Kindly help me out.

UPDATE:

> echo | xargs --show-limits

Your environment variables take up 3891 bytes
POSIX upper limit on argument length (this system): 2091213
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2087322
Size of command buffer we are actually using: 131072
Maximum parallelism (--max-procs must be no greater): 2147483647

UPDATE 1:

I have tested another scenario. This time I am producing the same message using java producer instead of console producer and now when I consume I get the complete message.

Upvotes: 1

Views: 3338

Answers (1)

NanoPish
NanoPish

Reputation: 1501

May be the problem occurring because you are using console producer and copying the message to terminal (linux) but terminal truncate long message to a maximum fixed length.

You can try using echo | xargs --show-limits or other shell or term settings to find out.

It can also come from the operating system, for example ARG_MAX:

getconf ARG_MAX

Can be too small for your message.

The easiest way would be writting the file directly to kafka-console-producer, like that example:

kafka-console-producer.sh --broker-list localhost:9092 --topic my_topic
--new-producer < my_file.txt

If it works correctly it means that this was indeed the issue.


For the record, these settings should also be tested:

  • Consumer side:fetch.message.max.bytes - this will determine the largest size of a message that can be fetched by the consumer.
  • Broker side: replica.fetch.max.bytes - this will allow for the replicas in the brokers to send messages within the cluster and make sure the messages are replicated correctly. If this is too small, then the message will never be replicated, and therefore, the consumer will never see the message because the message will never be committed (fully replicated).
  • Broker side: message.max.bytes - this is the largest size of the message that can be received by the broker from a producer.
  • Broker side (per topic): max.message.bytes - this is the largest size of the message the broker will allow to be appended to the topic. This size is validated pre-compression. (Defaults to broker's message.max.bytes.)

Upvotes: 2

Related Questions