Reputation: 599
For a project, i work a lot with large volumes of encrypted data which is used read heavy. Since decryption takes much longer than inflating i'm willing to deflate the data before encrypting and storing.
The difficulty i'm facing is that the data is stored in fixed length chunks or pages. These chunks on disk need to remain fixed length in order for fast page lookup. So basically i'm trying to deflate as much data as possible into a fixed size page.
At current i'm trying to find a good approach to do so. However, at this moment i'm a bit stuck at trailing the compressed size each time data is added and the uncompressed size is near the page limit. (since data can in theory also grow a bit due to compression if the entropy of the data is very high). Currently, i'm trying the following approach:
final Deflater deflater = new Deflater();//Deflater.HUFFMAN_ONLY);
final Inflater inflater = new Inflater();
long start;
long duration;
int freeSpace = size;
int fill = 0;
byte[] page;
final byte[] buf = new byte[8];
deflater.reset();
try( ByteArrayOutputStream boas = new ByteArrayOutputStream(size);
DeflaterOutputStream dos = new DeflaterOutputStream(boas, deflater, size, true)){
start = System.currentTimeMillis();
while(true){
long compressable = (long) (Random.nextLong(30) + 100);
fill += ByteTools.longToByteArray(compressable, buf, 0, 8);
dos.write(buf);
freeSpace = size - boas.size();
if(freeSpace < 16){
System.out.println(boas.size());
dos.finish();
System.out.println(boas.size());
page = boas.toByteArray();
break;
}
}
duration = System.currentTimeMillis() - start;
}
The above code is functional for deflating, however the length of the output is increased dramatically upon the dos.finished(). which is not surprising, however is there any good way of determining the resulting output size, or are there other compression scheme's that are more appropriate for the task?
Since padding can be applied there is no need for a 100% accurate output size, the range of 95%-100% would be perfect and performant enough. Of course 100%+ should be prevented at all times.
Based on trail and error i adapted the routine a bit which gives me nice results. However i do not feel very comfortable with this solution yet.
while(true){
long compressable = (long) (Random.nextLong(30) + 100);
block += ByteTools.longToByteArray(compressable, buf, 0, 8);
dos.write(buf);
if(block >= check){
//check /= 2;
dos.flush();
fill += block;
block = 0;
check = (size - boas.size()) - 8;
System.out.println(check);
}
if(check < 16){
fill += block;
dos.finish();
page = boas.toByteArray();
break;
}
}
The solution has a compression ratio that is not far from the original comression ratio (in one block) and stays within 8 bytes of the required output size. The check size decrease takes the following forms:
16384
8088
4259
2207
1110
540
246
94
32
3
resulting in 9 flushes during the page generation and 1 finish.
Upvotes: 1
Views: 591
Reputation: 112394
deflate is not well suited for this, but it can be coerced into getting very close to filling a block if you let it try a few times. Take a look at fitblk, which does exactly what you are asking for, by doing three compression passes, including two decompressions in between them.
The idea is to compress more than your block size, decompress just your block size, and then recompress only what was decompressed. You do that twice to get very close to, or a lot of the time, exactly filling the block.
Upvotes: 1