Reputation: 138357
What's a nice, readable way of getting the byte representation (i.e. a byte[]
) of an int
, but only using 3 bytes (instead of 4)? I'm using Hadoop/Hbase and their Bytes
utility class has a toBytes
function but that will always use 4 bytes.
Ideally, I'd also like a nice, readable way of encoding to as few bytes as possible, i.e. if the number fits in one byte then only use one.
Please note that I'm storing this in a byte[]
, so I know the length of the array and thus variable length encoding is not necessary. This is about finding an elegant way to do the cast.
Upvotes: 0
Views: 3406
Reputation: 7293
If i understand right that you really, desperately want to save space, even at expense of arcane bit shuffling: any array type is an unecessary luxury because you cannot use less than one whole byte for the length = addressing space 256 while you know that at most 4 will be needed. So i would reserve 4 bits for the length and sign flag and cram the rest aligned to that number of bytes. You might even save one more byte if your MSB is less than 128. The sign flag i see useful for ability to represent negative numbers in less than 4 bytes too. Better have the bit there every time (even for positive numbers) than overhead of 4 bytes for representing -1.
Anyway, this all is a thin water until you make some statistics on your data set, how many integers are actually compressible and whether the compression overhead is worth the effort.
Upvotes: 0
Reputation: 269677
A general solution for this is impossible.
If it were possible, you could apply the function iteratively to obtain unlimited compression of data.
Your domain might have some constraints on the integers that allow them to be compressed to 24-bits. If there are such constraints, please explain them in the question.
A common variable size encoding is to use 7 bits of each byte for data, and the high bit as a flag to indicate when the current byte is the last.
You can predict the number of bytes needed to encode an int
with a utility method on Integer
:
int n = 4 - Integer.numberOfLeadingZeros(x) / 8;
byte[] enc = new byte[n];
while (n-- > 0)
enc[n] = (byte) ((x >>> (n * 8)) & 0xFF);
Note that this will encode 0 as an empty array, and other values in little-endian format. These aspects are easily modified with a few more operations.
Upvotes: 3
Reputation: 93968
Try using ByteBuffer. You can even set little endian mode if required:
int exampleInt = 0x11FFFFFF;
ByteBuffer buf = ByteBuffer.allocate(Integer.SIZE / Byte.SIZE);
final byte[] threeByteBuffer = new byte[3];
buf.putInt(exampleInt);
buf.position(1);
buf.get(threeByteBuffer);
Or the shortest signed, Big Endian:
BigInteger bi = BigInteger.valueOf(exampleInt);
final byte[] shortestSigned = bi.toByteArray();
Upvotes: 1
Reputation: 6505
You can start with something like this:
byte[] Convert(int i)
{ // warning: untested
if (i == 0)
return new byte[0];
if (i > 0 && i < 256)
return new byte[]{(byte)i};
if (i > 0 && i < 256 * 256)
return new byte[]{(byte)i, (byte)(i >> 8)};
if (i > 0 && i < 256 * 256 * 256)
return new byte[]{(byte)i, (byte)(i >> 8), (byte)(i >> 16)};
return new byte[]{(byte)i, (byte)(i >> 8), (byte)(i >> 16), (byte)(i >> 24)};
}
You'll need to decide if you want to be little-endian or big-endian. Note that negative numbers are encoded in 4 bytes.
Upvotes: 0
Reputation: 7302
Convert your int
to a 4 byte
s array, and iterate it, if every high order byte is zero then remove it from array.
Something like:
byte[] bytes = toBytes(myInt);
int neededBytes = 4;
for (;neededBytes > 1; i--) {
if (bytes[neededBytes - 1] != 0) {
break;
}
}
byte[] result = new byte[neededBytes];
// then just use array copy to copy first neededBytes to result.
Upvotes: 0
Reputation: 37822
If you need to represent the whole 2^32 existing 4-byte integers, you need to chose between:
Take a look on how UTF-8 encodes the Unicode charactes, you might get some insights. (you use some short prefix to describe how many bytes must be read for that unicode character, then you read that many bytes and interpret them).
Upvotes: 1