Handling large gzfile in c++

Question

char buffer[1001];
for(;!gzeof(m_fHandle);){ 
         gzread(m_fHandle, buffer, 1000);
     The file I'm handling is more than 1GB.

do I load the entire file to the buffer? or should I malloc and allocate the size?

Or should I load it line by line? the file has a " " demarkating the EOL. if so, how do I do that for handling gzfile in c++?

sehe · Accepted Answer

The zlib approach would be:

You can just call gzread with a limited buffer size repeatedly. If you can be sure that he max line length is eg BUFLEN: See it Live On Coliru

#include 
#include 
#include 

static const unsigned BUFLEN = 1024;

void error(const char* const msg)
{
    std::cerr << msg << "
";
    exit(255);
}

void process(gzFile in)
{
    char buf[BUFLEN];
    char* offset = buf;

    for (;;) {
        int err, len = sizeof(buf)-(offset-buf);
        if (len == 0) error("Buffer to small for input line lengths");

        len = gzread(in, offset, len);

        if (len == 0) break;    
        if (len <  0) error(gzerror(in, &err));

        char* cur = buf;
        char* end = offset+len;

        for (char* eol; (cur



If you cannot know the maximum line size, I'd suggest abstracting it a bit more and deriving from std::basic_streambuf overriding underflow so you can use std::getline with an istream based on this buffer.

UPDATE Since you're new to C++, implementing your own streambuf is likely not a good idea. I recommend using a c++ library (instead of zlib).

E.g. Boost Iostream allows you to simply do this:

Live On Coliru

#include 
#include 
#include     

namespace io = boost::iostreams;

int main()
{   
    io::filtering_istream in;
    in.push(io::gzip_decompressor());
    in.push(io::file_source("my_file.txt"));
    // read from in using std::istream interface

    std::string line;
    while (std::getline(in, line, '
'))
    {
         process(line); // your code :)
    }
}

Handling large gzfile in c++

Answers (2)

Related Questions