Giovanni Far
Giovanni Far

Reputation: 1653

rle compression algorithm c

I have to do a rle algorithm in c with the escape character (Q)

example if i have an input like: AAAAAAABBBCCCDDDDDDEFG
the output have to be: QA7BBBCCCQD6FFG

this is the code that i made:

#include <stdio.h>
#include <stdlib.h>

void main()
{ 
    FILE *source = fopen("Test.txt", "r");
    FILE *destination = fopen("Dest.txt", "w");
    char carCorrente; //in english: currentChar
    char carSucc;     // in english: nextChar
    int count = 1;

    while(fread(&carCorrente, sizeof(char),1, source) != 0) {
        if (fread(&carCorrente, sizeof(char),1, source) == 0){
            if(count<=3){
                for(int i=0;i<count;i++){
                    fprintf(destination,"%c",carCorrente);
                }
            }
            else {
                    fwrite("Q",sizeof(char),1,destination);
                    fprintf(destination,"%c",carCorrente);
                    fprintf(destination,"%d",count);
                }
            break;
        }
        else fseek(source,-1*sizeof(char), SEEK_CUR);

        while (fread(&carSucc, sizeof(char), 1, source) != 0) {
            if (carCorrente ==  carSucc) {
                count++;
            } 
            else {
                if(count<=3){
                    for(int i=0;i<count;i++){
                        fprintf(destination,"%c",carCorrente);
                    }
                }
                else {
                    fwrite("Q",sizeof(char),1,destination);
                    fprintf(destination,"%c",carCorrente);
                    fprintf(destination,"%d",count);
                }

                count = 1;
                goto OUT;
            }
        }

OUT:fseek(source,-1*sizeof(char), SEEK_CUR); //exit 2° while
    }
}

the problem is when i have an input like this: ABBBCCCDDDDDEFGD
in this case the output is: QB4CCCQD5FFDD
and i don't know why :(

Upvotes: 0

Views: 8744

Answers (4)

Giovanni Far
Giovanni Far

Reputation: 1653

Thank you so much, i fixed my algorithm. The problem was a variable, in the first if after the while. Before

if (fread(&carCorrente, sizeof(char),1, source) == 0)

now

if (fread(&carSucc, sizeof(char),1, source) == 0){

for sure all my algorithm is wild. I mean it is too much slow!
i made a test with my version and with the version of Vikram Bhat and i saw how much my algorithm losts time.
For sure with getc() i can save more time.

now i'm thinking about the encoding (decompression) and i can see a little problem.

example:
if i have an input like: QA7QQBQ33TQQ10QQQ
how can i recognize which is the escape character ???

thanks

Upvotes: 0

Arkku
Arkku

Reputation: 42149

I think the major problem in your approach is that it's way too complicated with multiple different places where you read input and seek around in the input. RLE can be done in one pass, there should not be a need to seek to the previous characters. One way to solve this is to change the logic into looking at the previous characters and how many times they have been repeated, instead of trying to look ahead at future characters. For instance:

int repeatCount = 0;
int previousChar = EOF;
int currentChar; // type changed to 'int' for fgetc input

while ((currentChar = fgetc(source)) != EOF) {
    if (currentChar != previousChar) {
        // print out the previous run of repeated characters
        outputRLE(previousChar, repeatCount, destination);
        // start a new run with the current character
        previousChar = currentChar;
        repeatCount = 1;
    } else {
        // same character repeated
        ++repeatCount;
    }
}
// output the final run of characters at end of input
outputRLE(previousChar, repeatCount, destination);

Then you can just implement outputRLE to do the output to print out a run of the character c repeated count times (note that count can be 0); here's the function declaration:

void outputRLE(const int c, const int count, FILE * const destination)

You can do it pretty much the same way as in your current code, although it can be simplified greatly by combining the fwrite and two fprintfs to a single fprintf. Also, you might want to think what happens if the escape character 'Q' appears in the input, or if there is a run of 10 or more repeated characters. Deal with those cases in outputRLE.


An unrelated problem in your code is that the return type of main should be int, not void.

Upvotes: 1

Vikram Bhat
Vikram Bhat

Reputation: 6246

There is no need to use Fseek to rewind as u have done , Here is a code that is have written without using it by using simple counter & current sequence character.

C implementation:

#include<stdio.h>
#include<stdlib.h>

void main()
{ 
    FILE *source = fopen("Test.txt", "r");
    FILE *destination = fopen("Dest.txt", "w");
    char currentChar;
    char seqChar;
    int count = 0;

    while(1) {
      int flag = (fread(&currentChar, sizeof(char),1, source) == 0); 

      if(flag||seqChar!=currentChar) {

         if(count>3) {
           char ch = 'Q';
           int k = count;
           char str[100];
           int digits = sprintf(str,"%d",count);
           fwrite(&ch,sizeof(ch),1,destination);
           fwrite(&seqChar,sizeof(ch),1,destination);
           fwrite(&str,sizeof(char)*digits,1,destination);
         }
         else {
           for(int i=0;i<count;i++) 
              fwrite(&seqChar,sizeof(char),1,destination);
         }
         seqChar = currentChar;
         count =1;
      }

     else count++;

     if(flag)
       break;
    }

   fclose(source);
   fclose(destination);
}

Upvotes: 2

M Oehm
M Oehm

Reputation: 29126

Your code has various problems. First, I'm not sure whether you should read straight from the file. In your case, it might be better to read the source string to a text buffer first with fgets and then do the encoding. (I think in your assignment, you should only encode letters. If source is a regular text file, it will have at least one newline.)

But let's assume that you need to read straight from the disk: You don't have to go backwards. You already habe two variables for the current and the next char. Read the next char from disk once. Before reading further "next chars", assign the :

int carSucc, carCorr;             // should be ints for getc

carSucc = getc(source);           // read next character once before loop 
while (carSucc != EOF) {          // test for end of input stream
    int carCorr = next;           // this turn's char is last turn's "next"

    carSucc = getc(source);
    // ... encode ...
}

The going forward and backward makes the loop complicated. Besides, what happens if the second read read zero characters, i.e. has reached the end of the file? Then you backtrace once and go into the second loop. That doesn't look as if it was intended.

Try to go only forward, and use the loop above as base for your encoding.

Upvotes: 1

Related Questions