webdev
webdev

Reputation: 31

Efficiency of custom compression algorithm

I have a compression algorithm idea and I have two questions:

  1. Should I deal with it ? Will it be efficient ?
  2. How can I optimize it?

Here is the algorithm I've created so far.

int i = 0,j, diff, beginIndex = 0;
while(i < tmp.length){
    j = i;
    byte first = tmp[i];
    int total = 0;
    while(j < tmp.length && first == tmp[j] && total < 127){ j++; total++;}

    if(total > 3){
        if(beginIndex != i){
            diff = i - beginIndex;
            packed.put((byte)diff);
            packed.put(tmp, beginIndex, diff);
        }
        packed.put((byte)(0x80 | total));
        packed.put(tmp[i]);
        beginIndex = j; 
    } 

    i = j;

    if(i-beginIndex == 127){
        packed.put((byte)127);
        packed.put(tmp, beginIndex, 127);
        beginIndex = i;
    }
}

if(beginIndex < i){
    diff = i - beginIndex;
    packed.put((byte)diff);
    packed.put(tmp, beginIndex, diff);
}

Example input (each letter describes a byte)

[A, B, C, D, E, E, B, B, A, A, A, A, A, A, A, A, A, A, A, A, A, B, B, B, B, C, C] = 27 bytes

Example output

[0x80, A, B, C, D, E, E, B, B, 0x8D, A, 0x84, B, 0x82, C, C] = 16 bytes

In examples 0x80 is the packed bit. Represents if following letter will be repeated. 0xFF - 0x80 = 0x7F is the maksimum repeat count (127). So, 0x8D represents following byte will be repeated 0xD (13) times

Any idea optimizing that algorithm? Will it be useful or shall I get rid off the idea?

Upvotes: 3

Views: 189

Answers (2)

gordy
gordy

Reputation: 9786

You've invented run-length encoding. Most compression algorithms already incorporate a sort of run-length encoding that will out perform your implementation and work better in more cases. So I wouldn't pursue it if I were you.

If you're interested in data compression in general I highly recommend Managing Gigabytes chapters 2 and 6 as a very accessible read.

Upvotes: 1

EvgeniyZh
EvgeniyZh

Reputation: 905

The question is, what is the purpose of your algorithm?

To invent something really new, you need to check, what was invented before. Read some papers and book about data compression, etc. Data Compression Explained can be good place to start.

If you just want to practice in writing algorithms, it's totally OK. Continue with improving your algorithm, refactoring, speedups, profiling, etc.

If you want your algorithm to be practical, once again, check what was created before. Open source compression algorithms, such as zlib worth studying.

If you want to check, how your algorithm compares to others, run it on some popular test, such as Silesia Open Source Compression Benchmark. That'll give your an intuition where you stand (this might be somewhat disappointing, but don't give up).

Finally, if you want to have fun, just do whatever you want, and don't listen to anyone.

Upvotes: 1

Related Questions