Why does MQTT use such a strange encoding scheme for remaining length?

Question

I've recently started writing an MQTT library for a microcontroller. I've been following the specification document. Section 2.2.3 explains how the remaining length field (part of the fixed header) encodes the number of bytes to follow in the rest of the packet.

It uses a slightly odd encoding scheme:
Byte 0 = a mod 128, a /= 128, if a > 0, set top bit and add byte 1 Byte 1 = a mod 128, a /= 128, if a > 0, set top bit... etc

This variable length encoding seems odd in this application. You could easily transmit the same number using fewer bytes, especially once you get into numbers that take 2-4 bytes using this scheme. MQTT was designed to be simple to use and implement. So why did they choose this scheme?

For example, decimal 15026222 would be encoded as 0xae 0x90 0x95 0x7, however in hexadecimal it's 0xE5482E -- 3 bytes instead of four. The overhead in calculating the encoding scheme and decoding it at the other end seems to contradict the idea that MQTT is supposed to be fast and simple to implement on an 8-bit microcontroller.

What are the benefits to this encoding scheme? Why is it used? The only blog post I could find that even mentions any motivation is this one, which says:

The encoding of the remaining length field requires a bit of additional bit and byte handling, but the benefit is that only a single byte is needed for most messages while preserving the capability to send larger message up to 268’435’455 bytes.

But that doesn't make sense to me. You could have even more messages be only a single byte if you used the entire first byte to represent 0-255 instead of 0-127. And if you used straight hexadecimal, you could represent a number as large as 4 294 967 295 instead of only 268 435 455.

Does anyone have any idea why this was used?

Amit · Accepted Answer

As the comment you cited explains, under the assumption that "only a single byte is needed for most messages", or in other words, under the assumption that most of the time a <= 127 only a single byte is needed to represent the value.

The alternatives are:

Use a value to explicitly indicate how many bytes (or bits) are needed for a. This would require dedicating at least 2 bits to support at most "4 byte" sized a for all messages.
Dedicate a fixed size for a, probably 4 bytes, for all messages. This is inferior if many (read: most) messages don't utilize this size and can't support larger values if that becomes a requirement.

Why does MQTT use such a strange encoding scheme for remaining length?

Answers (1)

Related Questions