moinudin
moinudin

Reputation: 138357

Get byte representation of int, using only 3 bytes

What's a nice, readable way of getting the byte representation (i.e. a byte[]) of an int, but only using 3 bytes (instead of 4)? I'm using Hadoop/Hbase and their Bytes utility class has a toBytes function but that will always use 4 bytes.

Ideally, I'd also like a nice, readable way of encoding to as few bytes as possible, i.e. if the number fits in one byte then only use one.

Please note that I'm storing this in a byte[], so I know the length of the array and thus variable length encoding is not necessary. This is about finding an elegant way to do the cast.

Upvotes: 0

Views: 3406

Answers (6)

Pavel Zdenek
Pavel Zdenek

Reputation: 7293

If i understand right that you really, desperately want to save space, even at expense of arcane bit shuffling: any array type is an unecessary luxury because you cannot use less than one whole byte for the length = addressing space 256 while you know that at most 4 will be needed. So i would reserve 4 bits for the length and sign flag and cram the rest aligned to that number of bytes. You might even save one more byte if your MSB is less than 128. The sign flag i see useful for ability to represent negative numbers in less than 4 bytes too. Better have the bit there every time (even for positive numbers) than overhead of 4 bytes for representing -1.

Anyway, this all is a thin water until you make some statistics on your data set, how many integers are actually compressible and whether the compression overhead is worth the effort.

Upvotes: 0

erickson
erickson

Reputation: 269677

A general solution for this is impossible.

If it were possible, you could apply the function iteratively to obtain unlimited compression of data.

Your domain might have some constraints on the integers that allow them to be compressed to 24-bits. If there are such constraints, please explain them in the question.

A common variable size encoding is to use 7 bits of each byte for data, and the high bit as a flag to indicate when the current byte is the last.


You can predict the number of bytes needed to encode an int with a utility method on Integer:

int n = 4 - Integer.numberOfLeadingZeros(x) / 8;
byte[] enc = new byte[n];
while (n-- > 0) 
  enc[n] = (byte) ((x >>> (n * 8)) & 0xFF);

Note that this will encode 0 as an empty array, and other values in little-endian format. These aspects are easily modified with a few more operations.

Upvotes: 3

Maarten Bodewes
Maarten Bodewes

Reputation: 93968

Try using ByteBuffer. You can even set little endian mode if required:

int exampleInt = 0x11FFFFFF;
ByteBuffer buf = ByteBuffer.allocate(Integer.SIZE / Byte.SIZE);
final byte[] threeByteBuffer = new byte[3];
buf.putInt(exampleInt);
buf.position(1);
buf.get(threeByteBuffer);

Or the shortest signed, Big Endian:

BigInteger bi = BigInteger.valueOf(exampleInt);
final byte[] shortestSigned = bi.toByteArray();

Upvotes: 1

bmm6o
bmm6o

Reputation: 6505

You can start with something like this:

byte[] Convert(int i)
{  // warning: untested
  if (i == 0)
    return new byte[0];
  if (i > 0 && i < 256)
    return new byte[]{(byte)i};
  if (i > 0 && i < 256 * 256)
    return new byte[]{(byte)i, (byte)(i >> 8)};
  if (i > 0 && i < 256 * 256 * 256)
    return new byte[]{(byte)i, (byte)(i >> 8), (byte)(i >> 16)};
  return new byte[]{(byte)i, (byte)(i >> 8), (byte)(i >> 16), (byte)(i >> 24)};
}

You'll need to decide if you want to be little-endian or big-endian. Note that negative numbers are encoded in 4 bytes.

Upvotes: 0

Amir Pashazadeh
Amir Pashazadeh

Reputation: 7302

Convert your int to a 4 bytes array, and iterate it, if every high order byte is zero then remove it from array.

Something like:

byte[] bytes = toBytes(myInt);
int neededBytes = 4;
for (;neededBytes > 1; i--) {
    if (bytes[neededBytes - 1] != 0) {
       break;
    }
}

byte[] result = new byte[neededBytes];
// then just use array copy to copy first neededBytes to result.

Upvotes: 0

Bruno Reis
Bruno Reis

Reputation: 37822

If you need to represent the whole 2^32 existing 4-byte integers, you need to chose between:

  • fixed-size representation, using 4 bytes always; or
  • variable-size representation, using at least 5 bytes for some numbers.

Take a look on how UTF-8 encodes the Unicode charactes, you might get some insights. (you use some short prefix to describe how many bytes must be read for that unicode character, then you read that many bytes and interpret them).

Upvotes: 1

Related Questions