Reputation: 302
I'm writing a custom compression algorithm in C that reads in ascii characters, removes the 1st bit from each of them (because it will always be 0), and then sticks it in a new file. It makes the input 7/8 the original size. Here's the compression:
#include <stdio.h>
int main()
{
int i = 1;
int c;
unsigned short value = 0;
while((c = getchar()) != EOF)
{
value = (c << i) | value;
if(i != 1) putchar(value >> 8);
value = value << 8;
i++;
if(i == 9) i = 1;
}
if(i != 1) putchar(value >> 8);
}
and here's the decompression:
#include <stdio.h>
int main() {
int i = 1;
int c;
unsigned char value = 0;
while((c = getchar()) != EOF) {
value = (c >> i) | value;
putchar(value);
value = (c << (8-i)) | 0;
value = value >> 1;
if(++i == 8) {
putchar(value);
i = 1;
}
}
}
If I compress something like "ororororor" (without the quotes), and then decompress it, then the output is "orororor.r", where the "." is 7F in hex. However, if I give it "ororororrr" then it outputs "ororororrr" which is correct. It only fails with certain inputs, but I can't find a pattern to when it messes up.
Sorry that this is not in functions. The way I've been using it is in UNIX with these commands:
echo -n your input here > data
gcc compress.c
./a.out < data > inp
gcc decompress.c
./a.out < inp > out
hexdump -C out
Upvotes: 1
Views: 809
Reputation: 57388
A problem is surely that you do not zero value
when decompressing.
This has no effect (the extra bits get rotated out) until you get at the end of file.
Try:
if(++i == 8) {
putchar(value);
i = 1;
value = 0; // Clean up
}
Test case (modified the above program to only zero value
if there was a command line argument):
echo "xxxxxxxRxx" | ./comp | ./decomp OK
xxxxxxxRxx
echo "xxxxxxxRxx" | ./comp | ./decomp
xxxxxxxRzx
Upvotes: 1
Reputation: 10184
Are you accounting for the situations where the input won't fall on even 8-bit boundaries? Kinda like the problem base 64 encoding has when it does the same kind of thing....
Upvotes: 1