mottese
mottese

Reputation: 302

Custom compression algorithm

I'm writing a custom compression algorithm in C that reads in ascii characters, removes the 1st bit from each of them (because it will always be 0), and then sticks it in a new file. It makes the input 7/8 the original size. Here's the compression:

#include <stdio.h>

int main()
{
  int i = 1;
  int c;
  unsigned short value = 0;

  while((c = getchar()) != EOF)
  {
    value = (c << i) | value;
    if(i != 1) putchar(value >> 8);
    value = value << 8;
    i++;
    if(i == 9) i = 1;
  }
  if(i != 1) putchar(value >> 8);
}

and here's the decompression:

#include <stdio.h>

int main() {

  int i = 1;
  int c;
  unsigned char value = 0;

  while((c = getchar()) != EOF) {
    value = (c >> i) | value;
    putchar(value);

    value = (c << (8-i)) | 0;
    value = value >> 1;

    if(++i == 8) {
      putchar(value);
      i = 1;
    }
  }
}

If I compress something like "ororororor" (without the quotes), and then decompress it, then the output is "orororor.r", where the "." is 7F in hex. However, if I give it "ororororrr" then it outputs "ororororrr" which is correct. It only fails with certain inputs, but I can't find a pattern to when it messes up.

Sorry that this is not in functions. The way I've been using it is in UNIX with these commands:

echo -n your input here > data
gcc compress.c
./a.out < data > inp
gcc decompress.c
./a.out < inp > out
hexdump -C out

Upvotes: 1

Views: 809

Answers (2)

LSerni
LSerni

Reputation: 57388

A problem is surely that you do not zero value when decompressing.

This has no effect (the extra bits get rotated out) until you get at the end of file.

Try:

 if(++i == 8) {
     putchar(value);
     i = 1;
     value = 0; // Clean up
 }

Test case (modified the above program to only zero value if there was a command line argument):

  echo "xxxxxxxRxx" | ./comp | ./decomp OK
  xxxxxxxRxx
  echo "xxxxxxxRxx" | ./comp | ./decomp
  xxxxxxxRzx

Upvotes: 1

David W
David W

Reputation: 10184

Are you accounting for the situations where the input won't fall on even 8-bit boundaries? Kinda like the problem base 64 encoding has when it does the same kind of thing....

Upvotes: 1

Related Questions