Reputation: 1431
I was presented with the situation where a file with a proprietary format was compressed to a .gz, then subsequently renamed it back to its original extension and then compressed again. I would like to capture such scenario and wonder whether there is a way to detect when a file has been compressed twice.
I am reading the .gz files as follows:
GZIPInputStream gzip = new GZIPInputStream(Files.newInputStream(inFile));
BufferedReader breader = new BufferedReader(new InputStreamReader(gzip));
Upvotes: 4
Views: 1592
Reputation: 8348
You can check for a valid gzip header within the file. A gzip file should contain a defined header starting with a 2-byte number with values 0x1f and 0x8b (see spec ). You can check these bytes to see if they match the header values:
InputStream is = new FileInputStream(new File(filePath));
byte[] b = new byte[2];
int n = is.read(b);
if ( n != 2 ){
//not a gzip file
}
if ( (b[0] == (byte) 0x1f) && (b[1] == (byte)0x8b)){
//2-byte gzip header
}
These two bytes alone have an ~1/65k chance of randomly occurring, but depending upon the data you expect to receive can be enough to base your decision. To be more confident of the call you can read further into the header to be sure it follows valid spec values (see link above - eg third byte is typically but not always an 8
for DEFLATE
compression, and so on...)
Upvotes: 2
Reputation: 140633
A brute force way would be: uncompress the file; and if that works; try to uncompress it again. If that works again, you know that it was compressed (at least twice). But worst case, it could still be compressed.
And actually; I dont other ways to figure that.
You see, in the end, compression is about changing the bytes of your file. SO, even when the second compression doesn't do much to the content of the file; it still changes some bytes. So, just from looking at those bytes, you wont see what is going on.
Upvotes: 1