Reputation: 4984
I'm not sure if this board is the right place for such question, but I really couln't find a better place, so let me apologize for this in advance.
I'm trying to read a third party database for interoperability purposes and I'm having a very hard time with one specific table. This table has two columns: blobSize and blob. Blob size is an integer and blob is a byte array.
I'm guessing this field is zipped based on two assumptions:
1) blobsize does not correspond to the actual size of the blob field, as an example the blob I'll post at the end of the post has 294 bytes, while the blobsize informs a size of 2560.
2) the blob starts with 0x50 0x4B 0x01 0x02 (P K 1 2) which is pretty similar to the central directory header of a zip file (https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html#datadescriptor). But zip files have the zipped data in the begining of the file and the central directory is in the end of the file. The blob starts with something similar to the zip format central directoy and then have a lot of data, which is the inverse.
I tried to decompress the data with SevenZipSharp and XCeed Zip libraries without success. Since this data is generated in the application (and not zipping a file), there wouldn't be any information about filename, size, modification date etc in the blob, and these libraries expect that the data is from a file.
I also tried to find each element of the central directoy in the bytes, and they seem to follow what is specified in the zip file format. One special information that is present in the central directory section is the compression method, which in these database field would be '0x09 0x00', which should be enhanced deflate (deflate64).
Maybe I don't know how to decompress this data with the libraries, maybe they even aren't a compressed field. Maybe someone more experienced with compressed data or zip files may direct me to the right path.
This data should contain geometric information of some database elements. I also don't think it's is an encrypted field, because all the other data in the database is in binary format, but open and I managed to read them all. This is the only field that is giving me headaches.
As example, here it the contents of one row:
blobSize: 2560
blob:
-
string hexa = "504B01021500150004000900C0480E470000C048FFFFFFFF000000000000000000000000FFFF0000000000000000BB705EF0C1C28D520F19D0801D0333C3BFFF9C0C6C48E28C4036088381000303139001E2FFFBFFFF3F44908101C81C550D007F81B1058A3F181E550D883C2A445610433E1096302830B832E401E922864A5856268A16636085E77978D9804367C164959D9DD72E303283E4A18A5D80F6BA31C433043384300401D98E0CBE3874631716636062440E06ECAA30456F610A912D428EFD645B86452325F683A201548E83E20454068CB6070037D552A6591CB3F7E3A0EDA1E2705A80C119585EE4321400C962864C608991CAE00EC4203103206460C06112CC06B849309365817AF0800FF6782491A43ED82FE3021B09564FA82842D238B29900";
byte[] bytes = Enumerable.Range(0, hexa.Length)
.Where(x => x % 2 == 0)
.Select(x => Convert.ToByte(hexa.Substring(x, 2), 16))
.ToArray();
Upvotes: 2
Views: 399
Reputation: 151720
But zip files have the zipped data in the begining of the file and the central directory is in the end of the file.
That's true, but from the page you linked, each zip file header starts with the letters "PK" or 0x50 0x4B. This indicates it at least looks like a zip file, and you could try to read it as such.
See Unzip a memorystream (Contains the zip file) and get the files for examples.
Upvotes: 1