Reputation: 35772
I want to take a long value in Java, and convert it to a byte array.
However, I want the representation to be small for small values, so perhaps if the value is less than 127 then it requires only a single byte.
The encoding and decoding algorithms should be extremely efficient.
I'm sure this has been done but I can't find any example code, anyone got any pointers?
Upvotes: 0
Views: 842
Reputation: 718798
(I might be stating the obvious to some people ... but here goes.)
If you are doing to reduce the size long
values in some external serialization, go ahead.
However, if you are trying to save memory in an Java program you are probably wasting your time. The smallest representation of a byte[]
in Java is either 2 or 3 32-bit words. And that is for a byte array of length zero. Add some multiple of 32 bit words to for any array length greater than zero. Then you've got to allow at least 1 32-bit word to hold the reference to the byte[]
object.
If you add that up, it takes at least 4 words to represent any given long
other than 0L
as a byte[]
.
The only case where you are going to get any saving is if you are representing a number of long
values in a single byte[]
. You will need at least 3 long
values before you can possibly break even, and even then if you will lose if the values turn out to be too large on average.
Upvotes: 0
Reputation: 533510
You can use stop bit encoding e.g.
public static void writeLong(OutputStream out, long value) throws IOException {
while(value < 0 || value > 127) {
out.write((byte) (0x80 | (value & 0x7F)));
value = value >>> 7;
}
out.write((byte) value);
}
public static long readLong(InputStream in) throws IOException {
int shift = 0;
long b;
long value = 0;
while((b = in.read()) >= 0) {
value += (b & 0x7f) << shift;
shift += 7;
if ((b & 0x80) == 0) return value;
}
throw new EOFException();
}
This is a fast form of compression, but all compression comes at a cost. (However if you are bandwidth limited it may be faster to transmit and worth the cost)
BTW: Values 0 to 127 use one byte
. You can use the same routine for short
and int
values as well.
EDIT: You can still use generic compression after this and it can be smaller than not using this as well.
public static void main(String... args) throws IOException {
long[] sequence = new long[1024];
Random rand = new Random(1);
for (int i = 0; i < sequence.length; i+=2) {
sequence[i] = (long) Math.pow(2, rand.nextDouble() * rand.nextDouble() * 61);
// some pattern.
sequence[i+1] = sequence[i] / 2;
}
testDeflator(sequence);
testStopBit(sequence);
testStopBitDeflator(sequence);
}
private static void testDeflator(long[] sequence) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(new DeflaterOutputStream(baos));
for (long l : sequence)
dos.writeLong(l);
dos.close();
System.out.println("Deflator used " + baos.toByteArray().length);
}
private static void testStopBit(long[] sequence) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
for (long l : sequence)
writeLong(baos, l);
baos.close();
System.out.println("Stop bit used " + baos.toByteArray().length);
}
private static void testStopBitDeflator(long[] sequence) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(new DeflaterOutputStream(baos));
for (long l : sequence)
writeLong(dos, l);
dos.close();
System.out.println("Stop bit & Deflator used " + baos.toByteArray().length);
}
public static void writeLong(OutputStream out, long value) throws IOException {
while (value < 0 || value > 127) {
out.write((byte) (0x80 | (value & 0x7F)));
value = value >>> 7;
}
out.write((byte) value);
}
Prints
Deflator used 3492
Stop bit used 2724
Stop bit & Deflator used 2615
What works best is highly dependant on the data you are sending. e.g. If your data is truly random, any compression technique you use will only make the data larger.
The Deflator is a stripped down version of the GZip output (minus a header and CRC32)
Upvotes: 5
Reputation: 346300
Simply use a GZipOutputStream
- entropy encoding like GZip basically does exactly what you describe, just generically.
Edit: Just to be sure: do you realize that a variable-length encoding that uses only 1 byte for small numbers necessarily needs to use more than 8 bytes for most large ones? Unless you know that you'll have far more small than large numbers, it could even end up increasing the overall size of your data. Whereas GZIP adapts to your actual data set and can compress data sets that are skewed in different ways.
Upvotes: 2
Reputation: 114767
If you want to store long
values with different lengths, then you'll need a delimiter, otherwise you can't decide, which byte belongs to which long value... And the delimiters will add extra bytes to the data...
If you're looking for a fast library to store long values (with 64Bit each), I'd recommend colt. It is fast.
Upvotes: 0