Reputation: 943
I have 2x broker Kafka setup running on EC2, each with 4x4GB GP2 SSDs, the topic has 6 partitions and 1 replica. They drives mounted and I have set them up in the server.properties. But when I was load testing my system and seeing what was happening with the drives, 1 of the 4 drive on broker 1 had a had stored a lot of the data, eg of what I got:
Broker 1: ** NOTE: I manually reproduced the figures for mount /a for the post ***
Filesystem Size Used Avail Use% Mounted on
udev 16G 12K 16G 1% /dev
tmpfs 3.2G 344K 3.2G 1% /run
/dev/xvda1 7.8G 1.3G 6.1G 17% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
none 5.0M 0 5.0M 0% /run/lock
none 16G 0 16G 0% /run/shm
none 100M 0 100M 0% /run/user
/dev/xvdg 3.9G 8.0M 3.6G 1% /b
/dev/xvdf 3.9G 600M 3.2G 17% /a
/dev/xvdh 3.9G 8.0M 3.6G 1% /c
/dev/xvdi 3.9G 8.0M 3.6G 1% /d
Broker 2:
Filesystem Size Used Avail Use% Mounted on
udev 16G 12K 16G 1% /dev
tmpfs 3.2G 344K 3.2G 1% /run
/dev/xvda1 7.8G 1.3G 6.1G 17% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
none 5.0M 0 5.0M 0% /run/lock
none 16G 0 16G 0% /run/shm
none 100M 0 100M 0% /run/user
/dev/xvdg 3.9G 8.0M 3.6G 1% /b
/dev/xvdf 3.9G 8.0M 3.6G 1% /a
/dev/xvdh 3.9G 8.0M 3.6G 1% /c
/dev/xvdi 3.9G 8.0M 3.6G 1% /d
Can someone explain what is happening and if I have set something up wrong? I thought they were supposed to be approx even across all drives?
Upvotes: 2
Views: 2225
Reputation: 5024
When you send load over Kafka, the producer uses a Partitioner implementation over the set of keys being sent, in order to work out which partition to write the message into. The default Partitioner implementation uses a hashing function. If you send all of your messages with the same key, then they will all hash into the same partition. The same can be true of a small set of keys - hashing often produces uneven distributions.
Your best bet is to use a larger key set, or configure the producer with a Partitioner that performs a more even distribution of messages - via round-robin for example. Whether this is something you want to do depends on whether you have a requirement to ensure that some messages are processed in order, in which case you should ensure that related messages use the same key, and take this into account in your Partitioner.
Upvotes: 5