Rusty Lemur
Rusty Lemur

Reputation: 1885

Print std::vector<char> without extraneous characters

I'm building a crude replacement for gunzip using the Poco libraries. Right now the design is to read a gzipped file (currently hardcoded to "data.gz") and print the output to stdout. I'm very close, but having some trouble as it is printing some extra characters, as shown below.

Setup:

Ubuntu 19.10 with libpocofoundation62 (and all libpoco*62) installed via apt install libpoco-dev

My c++ code, modeled heavily off of https://github.com/pocoproject/poco/issues/1507

#define _GLIBCXX_USE_CXX11_ABI 0
#include <fstream>
#include <iostream>
#include <Poco/InflatingStream.h>
#include <Poco/String.h>
#include <vector>

using std::cout;
using std::endl;

int main() {

        cout.sync_with_stdio(false);
        std::ifstream istr("data.gz", std::ios::binary); // In the future will take filename as argument
        Poco::InflatingInputStream inflating_stream(istr, Poco::InflatingStreamBuf::STREAM_GZIP);
        std::vector<char> buf(16); // In the future will be larger, like 1024
        while (true) {
                inflating_stream.read(buf.data(), buf.size());
                size_t gcount = inflating_stream.gcount();

                if (!gcount && inflating_stream.eof()) {
                        inflating_stream.reset();
                }

                // This way outputs all the correct data, but also some extraneous characters at the end
                if (gcount) {
                        cout << buf.data();
                }

                /* This way works, but is slower
                if (gcount) {
                        for (auto i: buf) {
                                std::cout << i;
                        }
                }
                */

                else {
                        break;
                }
        }

        return 0;
}

My data.gz

$ zcat data.gz 
foo, bar
baz, qux
quux, quuz
corge, grault
garply, waldo
fred, plugh
xyzzy, thud

My compilation command:

g++ mygunzip.cpp -o /tmp/mygunzip -lPocoFoundation -lPocoUtil && chmod +x /tmp/mygunzip

The result of running mygunzip:

$ /tmp/mygunzip
foo, bar
baz, qu��f�lUx
quux, quuz
cor��f�lUge, grault
garpl��f�lUy, waldo
fred, p��f�lUlugh
xyzzy, thud��f�lU
ugh
xyzzy, thud��f�lU

So all of the correct data is being printed out, but there is extraneous data held in buf.data() after each read that is also being printed out. What would be the most elegant way of handling the extraneous data? I've included in the comments another way that works, and it prints out all of the correct data without the extraneous characters. But it seems to be much slower, so I'm looking to improve on the other solution if possible.

Upvotes: 0

Views: 82

Answers (1)

Lightness Races in Orbit
Lightness Races in Orbit

Reputation: 385284

You gave cout a char* and told it to print it.

You didn't ask it to print x chars, so it treated the thing as a C-string and kept going until it encountered a nullptr. What else could it do?

Use cout.write(buf.data(), gcount) instead.

Upvotes: 3

Related Questions