Reputation: 31
I have a compression algorithm idea and I have two questions:
Here is the algorithm I've created so far.
int i = 0,j, diff, beginIndex = 0;
while(i < tmp.length){
j = i;
byte first = tmp[i];
int total = 0;
while(j < tmp.length && first == tmp[j] && total < 127){ j++; total++;}
if(total > 3){
if(beginIndex != i){
diff = i - beginIndex;
packed.put((byte)diff);
packed.put(tmp, beginIndex, diff);
}
packed.put((byte)(0x80 | total));
packed.put(tmp[i]);
beginIndex = j;
}
i = j;
if(i-beginIndex == 127){
packed.put((byte)127);
packed.put(tmp, beginIndex, 127);
beginIndex = i;
}
}
if(beginIndex < i){
diff = i - beginIndex;
packed.put((byte)diff);
packed.put(tmp, beginIndex, diff);
}
Example input (each letter describes a byte)
[A, B, C, D, E, E, B, B, A, A, A, A, A, A, A, A, A, A, A, A, A, B, B, B, B, C, C] = 27 bytes
Example output
[0x80, A, B, C, D, E, E, B, B, 0x8D, A, 0x84, B, 0x82, C, C] = 16 bytes
In examples 0x80 is the packed bit. Represents if following letter will be repeated. 0xFF - 0x80 = 0x7F is the maksimum repeat count (127). So, 0x8D represents following byte will be repeated 0xD (13) times
Any idea optimizing that algorithm? Will it be useful or shall I get rid off the idea?
Upvotes: 3
Views: 189
Reputation: 9786
You've invented run-length encoding. Most compression algorithms already incorporate a sort of run-length encoding that will out perform your implementation and work better in more cases. So I wouldn't pursue it if I were you.
If you're interested in data compression in general I highly recommend Managing Gigabytes chapters 2 and 6 as a very accessible read.
Upvotes: 1
Reputation: 905
The question is, what is the purpose of your algorithm?
To invent something really new, you need to check, what was invented before. Read some papers and book about data compression, etc. Data Compression Explained can be good place to start.
If you just want to practice in writing algorithms, it's totally OK. Continue with improving your algorithm, refactoring, speedups, profiling, etc.
If you want your algorithm to be practical, once again, check what was created before. Open source compression algorithms, such as zlib worth studying.
If you want to check, how your algorithm compares to others, run it on some popular test, such as Silesia Open Source Compression Benchmark. That'll give your an intuition where you stand (this might be somewhat disappointing, but don't give up).
Finally, if you want to have fun, just do whatever you want, and don't listen to anyone.
Upvotes: 1