Reputation: 1885
I'm building a crude replacement for gunzip using the Poco libraries. Right now the design is to read a gzipped file (currently hardcoded to "data.gz") and print the output to stdout. I'm very close, but having some trouble as it is printing some extra characters, as shown below.
Setup:
Ubuntu 19.10 with libpocofoundation62 (and all libpoco*62) installed via apt install libpoco-dev
My c++ code, modeled heavily off of https://github.com/pocoproject/poco/issues/1507
#define _GLIBCXX_USE_CXX11_ABI 0
#include <fstream>
#include <iostream>
#include <Poco/InflatingStream.h>
#include <Poco/String.h>
#include <vector>
using std::cout;
using std::endl;
int main() {
cout.sync_with_stdio(false);
std::ifstream istr("data.gz", std::ios::binary); // In the future will take filename as argument
Poco::InflatingInputStream inflating_stream(istr, Poco::InflatingStreamBuf::STREAM_GZIP);
std::vector<char> buf(16); // In the future will be larger, like 1024
while (true) {
inflating_stream.read(buf.data(), buf.size());
size_t gcount = inflating_stream.gcount();
if (!gcount && inflating_stream.eof()) {
inflating_stream.reset();
}
// This way outputs all the correct data, but also some extraneous characters at the end
if (gcount) {
cout << buf.data();
}
/* This way works, but is slower
if (gcount) {
for (auto i: buf) {
std::cout << i;
}
}
*/
else {
break;
}
}
return 0;
}
My data.gz
$ zcat data.gz
foo, bar
baz, qux
quux, quuz
corge, grault
garply, waldo
fred, plugh
xyzzy, thud
My compilation command:
g++ mygunzip.cpp -o /tmp/mygunzip -lPocoFoundation -lPocoUtil && chmod +x /tmp/mygunzip
The result of running mygunzip:
$ /tmp/mygunzip
foo, bar
baz, qu��f�lUx
quux, quuz
cor��f�lUge, grault
garpl��f�lUy, waldo
fred, p��f�lUlugh
xyzzy, thud��f�lU
ugh
xyzzy, thud��f�lU
So all of the correct data is being printed out, but there is extraneous data held in buf.data() after each read that is also being printed out. What would be the most elegant way of handling the extraneous data? I've included in the comments another way that works, and it prints out all of the correct data without the extraneous characters. But it seems to be much slower, so I'm looking to improve on the other solution if possible.
Upvotes: 0
Views: 82
Reputation: 385284
You gave cout
a char*
and told it to print it.
You didn't ask it to print x char
s, so it treated the thing as a C-string and kept going until it encountered a nullptr
. What else could it do?
Use cout.write(buf.data(), gcount)
instead.
Upvotes: 3