Tom Wright
Tom Wright

Reputation: 11489

How could I encode a string of 1s and 0s for transport?

For a genetic algorithm application, I'm using a whole load of binary strings. Most of the time they literally take the form of 01001010110, so that they can be mated, mutated and "crossed-over".

For transport and storage however, this seems wasteful. What's the simplest way to encode this as a shorter string?

I'm guessing this is pretty trivial, but I'm not sure where to start looking.

Update: I actually need to end up with another string: one of the transport requests will be GET requests.

Upvotes: 2

Views: 1716

Answers (5)

Pedery
Pedery

Reputation: 3636

Or implement Run length encoding or Huffman coding. Both are fairly easy to implement. RLE is by far the easiest, but will in most cases have worse compression ratio. If your data typically has many consecutive characters of the same value, it could still provide a substantial improvement.

Upvotes: 1

Hamish Grubijan
Hamish Grubijan

Reputation: 10830

Abe Miessler's answer is a good one, but with caveat mentioned in comments.

If 64 bits is not enough to represent your string, then consider using a BigInt class http://www.codeproject.com/KB/cs/BigInt.aspx (you would probably want to add to/fromBinary() extension methods to it. Alternatively represent this as a ... linked list of bytes.

Either approach has the problem of discarding any leading zeroes, so you would want to store the original length as well.

Upvotes: 0

Mark Byers
Mark Byers

Reputation: 838696

The simplest would be to take each digit and treat it as a bit. Each group of 8 bits can be stored in a byte. Then you can send it as a stream of bytes. You will also need to store the length of the original string so that you can distinguish between "0" and "00".

Here is one way you could write the conversion from string to a byte array:

byte[] convertToBytes(string s)
{
    byte[] result = new byte[(s.Length + 7) / 8];

    int i = 0;
    int j = 0;
    foreach (char c in s)
    {
        result[i] <<= 1;
        if (c == '1')
            result[i] |= 1;
        j++;
        if (j == 8)
        {
            i++;
            j = 0;
        }
    }
    return result;
}

Reversing the operation is very similar.

If you need to transmit the data as a string you can base 64 encode the resulting byte array.

You may also want to consider keeping it in this form in memory too. This will be much more efficient than storing it as a string where each digit is stored as a 2 byte character. You are using roughly 16 times more memory than you need to for storing your data. The disadvtange is that it is slightly more difficult to use in this form, so if you have enough memory then what you are currently doing might be just fine.

Upvotes: 8

Abe Miessler
Abe Miessler

Reputation: 85096

What about converting it to it's base 10 integer equivalent?

int myBin = Convert.ToInt32("01001010110", 2);

Convert.ToInt32() documentation

Upvotes: 2

Scott Chamberlain
Scott Chamberlain

Reputation: 127593

I would just store them as an array of bytes and use a helper function to translate between the byte array version and the string version.

Upvotes: 1

Related Questions