Reputation:
Let's say we compress some large binary serialized object by using the GZipStream class of the System.IO.Compression namespace. The output will be a Base64 string of some compressed byte array. At some later point the Base64 string gets converted back to a byte array and the data needs to be decompressed.
While compressing the data we create a new byte array with the size of the compressed byte array + 4. In the first 4 bytes we store the length/size of the compressed byte array and we then BlockCopy the length and the data to the new array. This new array gets converted into a Base64 string.
While decompressing we convert the Base64 string into a byte array. Now we can extract the length of the actual compressed data by using the BitConverter class which will extract a Int32 from the first 4 bytes. We then allocate a byte array with the length that we got from the first 4 bytes and let the Stream write the decompressed bytes to the byte array.
I can't image that something like this actually has any benefit at all. It adds more complexity to the code and more operations need to be executed. Readability is reduced too. The BlockCopy operations alone should consume so much resources that this just cannot have a benefit, right?
byte[] buffer = new byte[0xffff] // Some large binary serialized object
// Compress in-memory.
using (var mem = new MemoryStream())
{
// The actual compression takes place here.
using (var zipStream = new GZipStream(mem, CompressionMode.Compress, true)) {
zipStream.Write(buffer, 0, buffer.Length);
}
// Store compressed byte data here.
var compressedData = new byte[mem.Length];
mem.Position = 0;
mem.Read(compressedData, 0, compressedData.Length);
/* Increase the size by 4 to accommadate for an Int32 that
** will store the total length of the compressed data. */
var zipBuffer = new byte[compressedData.Length + 4];
// Store length of compressedData array in the first 4 bytes.
Buffer.BlockCopy(compressedData, 0, zipBuffer, 4, compressedData.Length);
// Store the compressedData array after the first 4 bytes which store the length.
Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, zipBuffer, 0, 4);
return Convert.ToBase64String(zipBuffer);
}
byte[] zipBuffer = Convert.FromBase64String("some base64 string");
using (var inStream = new MemoryStream())
{
// The length of the array that was stored in the first 4 bytes.
int dataLength = BitConverter.ToInt32(zipBuffer, 0);
// Allocate array with specific size.
byte[] buffer = new byte[dataLength];
// Writer data to buffer array.
inStream.Write(zipBuffer, 4, zipBuffer.Length - 4);
inStream.Position = 0;
// Decompress data.
using (var zipStream = new GZipStream(inStream, CompressionMode.Decompress)) {
zipStream.Read(buffer, 0, buffer.Length);
}
... code
... code
... code
}
Upvotes: 2
Views: 582
Reputation: 9824
You tagged the question as C#, wich means .NET, so the question is irrelevant:
The Framework already store the length with the Array. It is how the array classes do the sanity checks on Indexers. It how it prevents overflow attacks in managed code. That help alone is worth any minor inefficiency (note that the JiT is actually able to prune most of the checks. With a loop for example, it will simply look at the running variable once per loop).
You would have to go all the way into unmanaged code and handling naked pointers to have a hope to get rid of it. But why would you? The difference is so small, it falls under the speed rant. If it maters, you propably got a realtime programming case. And starting those with .NET was a bad idea.
Upvotes: 1