Reputation: 103
I am currently doing an exercise in KNKING C program a modern approach. This exercise.
(Question start)
Of the many techniques for compressing the contents of a file, one of the simplest and fastest is known as run-length encoding. This technique compresses a file by replacing sequences of identical bytes by a pair of bytes: a repetition count followed by a byte to be repeated. For example, suppose that the file to be compressed begins with the following sequence of bytes (shown in hexadecimal):
46 6F 6F 20 62 61 72 21 21 21 20 20 20 20 20
The compressed file will contain the following bytes:
01 46 02 6F 01 20 01 62 01 61 01 72 03 21 05 20
Run-length encoding works well if the original file contains many long sequences of identical bytes. In the worst case (a file with no repeated bytes), run-length encoding can actually double the length of the file. (Question end)
I have a question regarding my code related to the exercise, this part of the code (and a particular line in question)
int main(int argc, char *argv[])
{
FILE *fp,*fpout;
char *outfile;
unsigned char value,next,count;
long int position;
if (argc !=2)
{
printf("Error: Incorrect usage of program. Usage: c22p7.exe file\n");
exit(EXIT_FAILURE);
}
if ( (fp=fopen(argv[1], "rb")) == NULL)
{
printf("Error: Unable to open file\n");
exit(EXIT_FAILURE);
}
outfile = malloc(strlen(argv[1]) + 5);
strcpy(outfile,argv[1]);
strcat(outfile,".RLE");
if ( (fpout=fopen(outfile,"wb")) == NULL)
{
printf("Error: Unable to open file\n");
exit(EXIT_FAILURE);
}
free(outfile);
while ( fread(&value,sizeof(unsigned char),1,fp) > 0)
{
count = 1;
position = ftell(fp);
while ( fread(&next,sizeof(unsigned char),1,fp) > 0 && next == value)
{
count ++;
}
fwrite(&count,sizeof(unsigned char),1,fpout);
fwrite(&value,sizeof(unsigned char),1,fpout);
fseek(fp,-1L,SEEK_CUR); /* THIS PARTICULAR LINE */
}
fclose(fp);
fclose(fpout);
exit(EXIT_SUCCESS);
}
with regards to fseek(fp,-1L,SEEK_CUR);
, my rationale behind it was the program will keep reading the bytes until it has read the first byte that is different. It then moves back by one byte position hence the "-1L", so that on the next loop it will read back the byte. E.g.
01 01 01 01 02 02
It reads all the 01 until it reads the first 02, then fseek()
moves the file position back by 1 byte so on the next iteration of the loop it will read the first 02 again. However, if I implement the code this way it doesn't work.
fseek(fpin, position + (amount - 1), SEEK_SET);
^ This works, however. The position is the position of the file after reading the first byte, and the amount is the number of bytes already read. I understand how this particular line of code works, but I do not understand why my SEEK_CUR
method doesn't work. Thank you all for the help really
Upvotes: 0
Views: 83
Reputation: 58132
fseek
works the way it's supposed to, but your code is buggy.
Think about what happens when you reach the end of the file. For simplicity, think of a file that is only one byte long. In the current version of the code, the inner fread
will fail, but you don't have any special handling for that. So the fseek
will back up the file position by 1 byte, i.e. back to position 0, and the outer fread
will reread the same character you just read, ad infinitum. Your outer loop will never exit.
Basically, you're assuming that the last iteration of fread
in the inner loop will always have advanced the file position by 1, which your fseek
will effectively undo. But that is not true when end-of-file is reached; in that case fread
does not advance the file position.
With the other version of your code, you'll reach fseek
with position == 0
and count == 1
, and seek to file position 1. That's the end of the file, so the outer fread
will terminate as desired.
Upvotes: 1
Reputation: 33601
You do not need to back up the file. This will actually produce wrong results for the RLE.
If you must, there is an easier way. Use ungetc(chr,fp);
But, you can do this with a simpler loop. And, if you're going to get input char-at-a-time, use fgetc
instead of fread
.
You really only need a loop that does a single fgetc
at the top of the loop.
You already have the basics: A variable that remembers the current character and another that remembers the previous one [which is the RLE char].
When the character change occurs, you just have to dump the accumulated RLE pair. And, then set the new character as the RLE char and a starting count of one. Setting the value to one is the crucial reason why you don't have to backtrack.
Here's a version that works [tested with the specified data]:
#include <stdio.h>
void
rleout(int rlechr,int rlecnt)
{
if (rlecnt > 0) {
fputc(rlecnt,stdout);
fputc(rlechr,stdout);
}
}
int
main(void)
{
int curchr;
int rlechr = -1;
int rlecnt = 0;
while (1) {
// get next character
curchr = fgetc(stdin);
// hit EOF
if (curchr == EOF)
break;
// starting a new RLE
if (curchr != rlechr) {
// output the current RLE pair
rleout(rlechr,rlecnt);
// set new RLE pair
rlechr = curchr;
rlecnt = 1;
continue;
}
// advance number of consecutive chars
++rlecnt;
// if we're using the max space in a byte, we _have_ to dump the RLE
// pair
if (rlecnt >= 255) {
rleout(rlechr,rlecnt);
rlecnt = 0;
}
}
// output final pair [if any]
rleout(rlechr,rlecnt);
return 0;
}
Upvotes: 0