nanika
nanika

Reputation: 99

efficient way of getting chars out of zip file

I am using a function which needs content from zipped xml file. The signature of the function is

endgoalFn(const char* s, int len)

Below is the code i use for unzipping

std::ifstream file;
file.open(resultFile, std::ios_base::in | std::ios_base::binary);
boost::iostreams::filtering_streambuf<boost::iostreams::input> in;
in.push(boost::iostreams::gzip_decompressor());
in.push(file);

Below code i use for copying the content to a string and provide it to endgoal fn

std::stringstream buffer;
boost::iostreams::copy(in, buffer);
std::string content(std::move(buffer.str()));


endgoalFn(&content[0], content.size());

The three lines before calling endgoalFn takes x amount of ms depending on the size of the zipped file.

Is there any alternate way i can get the input for first argument in endgoalFn from filtering_streambuf so that i can decrease x amount of ms by few ms?

Update

it was suggested in comments to measure using optimized build.

example code contains two ways (original and the one suggested by @sehe)

2.4M    resultFile
Consuming 7975424 bytes
withSStream, time taken: 136ms
Consuming 7975424 bytes
withoutSStream, time taken: 160ms

Upvotes: 1

Views: 346

Answers (1)

sehe
sehe

Reputation: 393709

As others have noted, make sure you measure in Release build configuration.

Besides, there is no need to move to a string stream first:

std::ifstream file;
file.open(resultFile, std::ios_base::in | std::ios_base::binary);
boost::iostreams::filtering_streambuf<boost::iostreams::input> in;
in.push(boost::iostreams::gzip_decompressor());
in.push(file);

Now you can directly

std::string content(std::istreambuf_iterator<char>(in), {});
endgoalFn(content.data(), content.size());

More Ideas

If XML files can be large, and if endgoalFn can handle partial input (e.g. you can call it in a loop) you can do a buffered read:

char buf[4096];
while (in.read(buf, sizeof(buf)) || in.gcount()) {
    endgoalFn(buf, in.gcount());
}

UPDATE - Live Demo

See both demonstrated with a slightly modified sample that is self-contained:

Live On Coliru

#include <boost/iostreams/filter/gzip.hpp>
#include <boost/iostreams/filtering_stream.hpp>
#include <fstream>
#include <iostream>

namespace bio = boost::iostreams;

void endgoalFn(char const*, size_t n) {
    std::cout << "Consuming " << n << " bytes\n";
}

int main() {
    std::ifstream file;
    file.open("resultFile", std::ios::binary);

    bio::filtering_stream<bio::input> in;
    in.push(bio::gzip_decompressor());
    in.push(file);

#ifndef STREAMING
    std::string const content(std::istreambuf_iterator<char>(in), {});
    endgoalFn(content.data(), content.size());
#else 
    char buf[4096];
    while (in.read(buf, sizeof(buf)) || in.gcount()) {
        endgoalFn(buf, in.gcount());
    }
#endif
}

Compiled for STREAMING:

g++ -std=c++20 -O2 -Wall -pedantic -pthread main.cpp -DSTREAMING -lboost_iostreams -lz
cat a.out | gzip > resultFile
./a.out

Prints e.g.

Consuming 4096 bytes
Consuming 4096 bytes
// ...
Consuming 4096 bytes
Consuming 2856 bytes

Upvotes: 0

Related Questions