Reputation: 43
I am reading data from an input file and compressing it with bzip library function calls BZ2_bzCompress in C. I can compress the data successfully. But I cannot write all the compressed data to an output file. Only the first compressed line can be written. Am I missing something here.
int main()
{
bz_stream bz;
FILE* f_d;
FILE* f_s;
BZFILE* b;
int bzerror = -10;
unsigned int nbytes_in;
unsigned int nbytes_out;
char buf[3000] = {0};
int result = 0;
char buf_read[500];
char file_name[] = "/path/file_name";
long int save_pos;
f_d = fopen ( "myfile.bz2", "wb+" );
f_s = fopen(file_name, "r");
if ((!f_d) && (!f_s)) {
printf("Cannot open files");
return(-1);
}
bz.opaque = NULL;
bz.bzalloc = NULL;
bz.bzfree = NULL;
result = BZ2_bzCompressInit(&bz, 1, 2, 30);
while (fgets(buf_read, sizeof(buf_read), f_s) != NULL)
{
bz.next_in = buf_read;
bz.avail_in = sizeof(buf_read);
bz.next_out = buf;
bz.avail_out = sizeof(buf);
printf("%s\n", buf_read);
save_pos = ftell(f_d);
fseek(f_d, save_pos, SEEK_SET);
while ((result == BZ_RUN_OK) || (result == 0) || (result == BZ_FINISH_OK))
{
result = BZ2_bzCompress(&bz, (bz.avail_in) ? BZ_RUN : BZ_FINISH);
printf("2 result:%d,in:%d,outhi:%d, outlo:%d \n",result, bz.total_in_lo32, bz.total_out_hi32, bz.total_out_lo32);
fwrite(buf, 1, bz.total_out_lo32, f_d);
}
if (result == BZ_STREAM_END)
{
result = BZ2_bzCompressEnd(&bz);
}
printf("3 result:%d, out:%d\n", result, bz.total_out_lo32);
result = BZ2_bzCompressInit(&bz, 1, 2, 30);
memset(buf, 0, sizeof(buf));
}
fclose(f_d);
fclose(f_s);
return(0);
}
Upvotes: 0
Views: 119
Reputation: 181199
TL;DR: there are multiple problems, but the main one that explains the problem you asked about is likely that you compress each line of the file independently, instead of the whole file as a unit.
According to the docs of BZ2_bzCompressInit
, the bz_stream
argument should be allocated and initialized before the call. Yours is (automatically) allocated, but not (fully) initialized. It would be clearer and easier to change to
bz_stream bz = { 0 };
and then skip the assignments to bz.opaque
, bz.alloc
, and bz.free
.
You store but do not really check the return value of your BZ2_bzCompressInit
call. It does eventually get tested in the condition of the inner while
loop, but you do not detect error conditions there, but instead just success and normal completion conditions.
Your handling of the input buffer is significantly flawed.
In the first place, you set the number of available input bytes incorrectly:
bz.avail_in = sizeof(buf_read);
Since you're using fgets()
to read data into the buffer, under no circumstances is the full size of the buffer occupied by input data, because fgets()
ensures that a string terminator is written into the array. In fact, it could be worse because fgets()
will stop at after newlines, so it may provide as few as just one input byte on a successful read.
If you want to stick with fgets()
then you need to use strlen()
to determine the number of bytes available from each read, but I would suggest that you instead switch to fread()
, which will more reliably fill the buffer, indicate with its return value how many bytes were read, and correctly handle inputs containing null bytes.
In the second place, you use BZ2_bzCompress()
to compress each buffer of input as if it were a complete file. When you come to the end of the buffer, you finish a compression run and reinitialize the bz_stream. This will definitely interfere with decompressing, and it may explain why your program (seems to) compress only the first line of its input. You should be reading the whole content of the file (in suitably-sized chunks) and feeding all of it to BZ2_bzCompress(... BZ_RUN)
before you finish up. There should be one sequence of calls to BZ2_bzCompress(... BZ_FINISH)
and finally one call to BZ2_bzCompressEnd()
for the whole file, not per line.
You do not perform error detection or handling for any of your calls to standard library or bzip functions. You do handle the expected success-case return values for some of these, but you need to be rpepared for errors, too.
There are some additional oddities
nbytes_in
, nbytes_out
, bzerror
, and b
.ftell()
/ fseek()
pair has no overall effect other than setting save_pos
, which is not otherwise used.memset()
the output buffer to all-zeroes at the end of each line (or initially).Upvotes: 1