czhili
czhili

Reputation: 83

How can I quickly encode and then compress a short string containing numbers in c#

I have strings that look like this:

000101456890
348324000433
888000033380

They are strings that are all the same length and they contain only numbers.

I would like to find a way to encode and then ompress (reduce the length) of the strings. The compression algoithm would need to just compress down to ASCII characters as these will be used as web page links.

So for example:

www.stackoverflow.com/000101456890  goes to www.stackoverflow.com/aJks

Is there some way I could do this, some method that would do the job of compressing quickly.

Thanks,

Upvotes: 6

Views: 6633

Answers (2)

nakhli
nakhli

Reputation: 4059

I'm not sure Base 64 is url safe since it has '/' in its index table (the pack function provided in the selected answer will yield non url-safe strings).

You can consider replacing the '/' symbol by something more url friendly or use another base. Base 62 will do it here, for instance.

Here is a generic code that translates back and forth from decimal to any numeral base <= 64 (it's probably faster then converting to bytes and then using Convert.ToBase64String()):

static void Main()
{
    Console.WriteLine(Decode("101456890", 10));
    Console.WriteLine(Encode(101456890, 62));
    Console.WriteLine(Decode("6rhZS", 62));
    //Result:
    //101456890
    //6rhZS
    //101456890
}

public static long Decode(string str, int baze)
{
    long result = 0;
    int place = 1;
    for (int i = 0; i < str.Length; ++i)
    {
        result += Value(str[str.Length - 1 - i]) * place;
        place *= baze;
    }

    return result;
}

public static string Encode(long val, int baze)
{
    var buffer = new char[64];
    int place = 0;
    long q = val;
    do
    {
        buffer[place++] = Symbol(q % baze);
        q = q / baze;
    }
    while (q > 0);

    Array.Reverse(buffer, 0, place);
    return new string(buffer, 0, place);
}

public static long Value(char c)
{
    if (c == '+') return 62;
    if (c == '/') return 63;
    if (c < '0') throw new ArgumentOutOfRangeException("c");
    if (c < ':') return c - '0';
    if (c < 'A') throw new ArgumentOutOfRangeException("c");
    if (c < '[') return c - 'A' + 10;
    if (c < 'a') throw new ArgumentOutOfRangeException("c");
    if (c < '{') return c - 'a' + 36;
    throw new ArgumentOutOfRangeException("c");
}

public static char Symbol(long i)
{
    if (i < 0) throw new ArgumentOutOfRangeException("i");
    if (i < 10) return (char)('0' + i);
    if (i < 36) return (char)('A' + i - 10);
    if (i < 62) return (char)('a' + i - 36);
    if (i == 62) return '+';
    if (i == 63) return '/';
    throw new ArgumentOutOfRangeException("i");
}

Upvotes: 3

Marc Gravell
Marc Gravell

Reputation: 1064114

To do it simply, you could consider each as a long (plenty of room there), and hex-encode; that gives you:

60c1bfa
5119ba72b1
cec0ed3264

base-64 would be shorter, but you'd need to look at it as big-endian (note most .NET is little-endian) and ignore leading 0 bytes. That gives you:

Bgwb+g==
URm6crE=
zsDtMmQ=

For example:

    static void Main()
    {
        long x = 000101456890L, y = 348324000433L, z = 888000033380L;

        Console.WriteLine(Convert.ToString(x, 16));
        Console.WriteLine(Convert.ToString(y, 16));
        Console.WriteLine(Convert.ToString(y, 16));

        Console.WriteLine(Pack(x));
        Console.WriteLine(Pack(y));
        Console.WriteLine(Pack(z));

        Console.WriteLine(Convert.ToInt64("60c1bfa", 16).ToString().PadLeft(12, '0'));
        Console.WriteLine(Convert.ToInt64("5119ba72b1", 16).ToString().PadLeft(12, '0'));
        Console.WriteLine(Convert.ToInt64("cec0ed3264", 16).ToString().PadLeft(12, '0'));

        Console.WriteLine(Unpack("Bgwb+g==").ToString().PadLeft(12, '0'));
        Console.WriteLine(Unpack("URm6crE=").ToString().PadLeft(12, '0'));
        Console.WriteLine(Unpack("zsDtMmQ=").ToString().PadLeft(12, '0'));

    }
    static string Pack(long value)
    {
        ulong a = (ulong)value; // make shift easy
        List<byte> bytes = new List<byte>(8);
        while (a != 0)
        {
            bytes.Add((byte)a);
            a >>= 8;
        }
        bytes.Reverse();
        var chunk = bytes.ToArray();
        return Convert.ToBase64String(chunk);
    }
    static long Unpack(string value)
    {
        var chunk = Convert.FromBase64String(value);
        ulong a = 0;
        for (int i = 0; i < chunk.Length; i++)
        {
            a <<= 8;
            a |= chunk[i];
        }
        return (long)a;
    }

Upvotes: 8

Related Questions