Reputation: 93
I am making a serialization code and I´m trying to make the byte output smallest as possible.
My approach is packing zero bytes like this:
ex: D3 00 00 00 00 00 00 00 00 00 00 28 85 (12 bytes)
turns into 09(zero array length) 01 (offset) D3 28 85 (5 bytes)
My question is: There any efficient way to do this? I use a List to store and concatenate byte arrays, and the data have a "Header" to identify what kind of data the byte array are. Just stuck in this.
Suggestions?
Edit: This is the code for serialization so far.
bool start = false;
byte offsetStart = 0;
byte offsetEnd = 0;
for(int i = 0; i < array.Length; i++)
{
if(array[i] == 0 && !start)
{
offsetStart = (byte)i;
start = true;
}
else if (array[i] != 0 && start)
{
offsetEnd = (byte)i;
start = false;
}
// XX 00 XX is fine. XX 00 00 XX is also fine.
if(offsetEnd - offsetStart <= 3 ) return array;
else
{
byte arrayLength = offsetEnd - offsetStart;
return arrayLength + offsetStart +
Array.Take(offsetStart).Skip(ArrayLength).Take(array.Length - offsetEnd);
}
}
Edit 2: The code to reach I was looking is this, I am testing it yet.
public static int[] MapIntervals(byte[] inputBytes)
{
//First step: Map the bytes
bool startMap = false;
int[] mapIndex = new int[2];
int startIndex = 0;
for(int index = 0; index < inputBytes.Length; index++)
{
if (inputBytes[index] == 0 && !startMap)
{
startIndex = index;
startMap = true;
}
else if ((inputBytes[index] != 0 || index == inputBytes.Length - 1) && startMap)
{
if(index - startIndex >= 3)
{
mapIndex[0] = startIndex;
mapIndex[1] = index - startIndex;
return mapIndex;
}
else
{
startMap = false;
startIndex = 0;
index--;
}
}
}
return mapIndex;
}
//Step 2: Get Intervals
public static byte[] ShrinkZeros(byte[] inputBytes)
{
if (inputBytes.Length == 0) return inputBytes;
List<byte> bytes = new List<byte>(inputBytes);
List<byte> intervals = new List<byte>();
while (true)
{
int[] maparray = MapIntervals(bytes.ToArray());
//Here is the escape sequence of the loop
if (maparray.Length == 0 || maparray[0] == maparray[1])
{
if (intervals.Count == 0) return inputBytes;
var returnBytes = new List<byte>();
returnBytes.Add((byte)(intervals.Count / 2));
returnBytes.AddRange(intervals);
returnBytes.AddRange(bytes);
return returnBytes.ToArray();
}
else
{
intervals.Add((byte)maparray[0]);
intervals.Add((byte)maparray[1]);
var head = bytes.Take(maparray[0]).ToList();
var tail = bytes.Skip(maparray[0] + maparray[1]).ToArray();
bytes = head;
bytes.AddRange(tail);
}
}
}
The serialization works fine, I am work in deserialization of it at moment.
Also, like @Jeroen van Langen points, this works also putting a sequence of N zero-bytes N non-zero-bytes DATA
My question still alive. It´s seems to do the work or are best methods to do this?
Upvotes: 0
Views: 677
Reputation: 22083
You should go for a simple protocol. Create something like a header. For example. Start with a byte which indicates if you're starting with Zero's (0) or NonZero's (1). Zero's will be compressed, non-zero's not. So the first byte will be a zero or one. The second byte contains the count. If it were zero's, just write nothing, if it were non-zero's the values will be behind it.
Example:
[0][10][1][10]1234567891[0][10]
[0][9][1][12]123456789123[0][9]
So when decompressing:
[0][10] means, add 10 zero's,
[1][10]1234567891 means, add 10 non-zero's, followed by the values
[0][9] means, add 9 zero's,
[1][12]123456789123 means,add 12 non-zero's, followed by the values
This way the zero's aren't "saved". It is easily to compress and decompress.
If you have some difficulty writing the code, i'll add some.
Upvotes: 1