Fernando Rodrigues
Fernando Rodrigues

Reputation: 93

Best way to remove extra zeroes in byte array

I am making a serialization code and I´m trying to make the byte output smallest as possible.

My approach is packing zero bytes like this:

ex: D3 00 00 00 00 00 00 00 00 00 00 28 85 (12 bytes)
turns into 09(zero array length) 01 (offset) D3 28 85 (5 bytes)

My question is: There any efficient way to do this? I use a List to store and concatenate byte arrays, and the data have a "Header" to identify what kind of data the byte array are. Just stuck in this.

Suggestions?

Edit: This is the code for serialization so far.

bool start = false;
byte offsetStart = 0; 
byte offsetEnd = 0;
for(int i = 0; i < array.Length; i++)
{
    if(array[i] == 0 && !start)
    {
         offsetStart = (byte)i;
         start = true;
    } 
    else if (array[i] != 0 && start)
    {
         offsetEnd = (byte)i;
         start = false;
    }

    // XX 00 XX is fine. XX 00 00 XX is also fine.
    if(offsetEnd - offsetStart <= 3 ) return array;
    else 
    {
        byte arrayLength = offsetEnd - offsetStart;
        return arrayLength + offsetStart + 
        Array.Take(offsetStart).Skip(ArrayLength).Take(array.Length - offsetEnd);

    }
}

Edit 2: The code to reach I was looking is this, I am testing it yet.

 public static int[] MapIntervals(byte[] inputBytes)
        {
            //First step: Map the bytes
            bool startMap = false;
            int[] mapIndex = new int[2];
            int startIndex = 0;
            
            for(int index = 0; index < inputBytes.Length; index++)
            {
                if (inputBytes[index] == 0 && !startMap)
                {
                    startIndex = index;
                    startMap = true;
                }
                else if ((inputBytes[index] != 0 || index == inputBytes.Length - 1) && startMap)
                {
                    if(index - startIndex >= 3)
                    {
                        mapIndex[0] = startIndex;
                        mapIndex[1] = index - startIndex;
                        return mapIndex;
                    }
                    else
                    {
                        startMap = false;
                        startIndex = 0;
                        index--;
                    }
                }
            }

            
            return mapIndex;
        }
        
        //Step 2: Get Intervals
        public static byte[] ShrinkZeros(byte[] inputBytes)
        {
            if (inputBytes.Length == 0) return inputBytes;
            List<byte> bytes = new List<byte>(inputBytes);
            List<byte> intervals = new List<byte>();
            while (true)
            {
                int[] maparray = MapIntervals(bytes.ToArray());

                //Here is the escape sequence of the loop
                if (maparray.Length == 0 || maparray[0] == maparray[1])
                {
                    if (intervals.Count == 0) return inputBytes;
                    var returnBytes = new List<byte>();
                    returnBytes.Add((byte)(intervals.Count / 2));
                    returnBytes.AddRange(intervals);
                    returnBytes.AddRange(bytes);
                   
                    return returnBytes.ToArray();
                }
                else
                {
                    intervals.Add((byte)maparray[0]);
                    intervals.Add((byte)maparray[1]);

                    var head = bytes.Take(maparray[0]).ToList();
                    var tail = bytes.Skip(maparray[0] + maparray[1]).ToArray();
                    bytes = head;
                    bytes.AddRange(tail);
               
                }
            }
        }

The serialization works fine, I am work in deserialization of it at moment.

Also, like @Jeroen van Langen points, this works also putting a sequence of N zero-bytes N non-zero-bytes DATA

My question still alive. It´s seems to do the work or are best methods to do this?

Upvotes: 0

Views: 677

Answers (1)

Jeroen van Langen
Jeroen van Langen

Reputation: 22083

You should go for a simple protocol. Create something like a header. For example. Start with a byte which indicates if you're starting with Zero's (0) or NonZero's (1). Zero's will be compressed, non-zero's not. So the first byte will be a zero or one. The second byte contains the count. If it were zero's, just write nothing, if it were non-zero's the values will be behind it.

Example:

[0][10][1][10]1234567891[0][10]
[0][9][1][12]123456789123[0][9]

So when decompressing:

[0][10] means, add 10 zero's, 
[1][10]1234567891 means, add 10 non-zero's, followed by the values
[0][9]  means, add 9 zero's, 
[1][12]123456789123 means,add 12 non-zero's, followed by the values

This way the zero's aren't "saved". It is easily to compress and decompress.

If you have some difficulty writing the code, i'll add some.

Upvotes: 1

Related Questions