What is this variable-length integer encoding?

Question

I am documenting an old file format and have stumped myself with the following issue.

It seems to be that integers are variable-length encoded, with numbers <= 0x7F encoded in a single byte, but >= 0x80 are encoded in two bytes. An example set of integers and their encoded counterparts:

0x390 is encoded as 0x9007
0x150 is encoded as 0xD002
0x82 is encoded as 0x8201
0x89 is encoded as 0x8901

I have yet to come across any numbers that are larger than 0xFFFF, so I can't be sure if/how they are encoded. For the life of me, I can't work out the pattern here. Any ideas?

Matti Virkkunen · Accepted Answer

At a glance it looks like the numbers are split into 7-bit chunks, each of which is encoded as the 7 least significant bits of an output byte, while the most significant bit signifies whether there are more bytes following this one (i.e. the last byte of an encoded integer has 0 as its MSB).

The least significant bits of the input come first, so I guess you could call this "little endian".

Edit: see https://en.wikipedia.org/wiki/Variable-length_quantity (this is used in MIDI and Google protocol buffers)

What is this variable-length integer encoding?

Answers (1)

Related Questions