Selvaram G
Selvaram G

Reputation: 748

Reading Newly Added Contents from Compressed Log File (log4j) using Boost's Zstd Filtering Stream

I'm currently working on a C++ program aimed at tailing a live log file stored on disk. The catch is that the log file is compressed using Zstd. I've been using Boost's (1.83) Zstd filtering stream for decompression, but I've hit a roadblock. It seems that the filtering stream isn't picking up newly added contents to the log file.

I've tried troubleshooting this issue but haven't had much luck. Does anyone have experience with this or know of any resources I could consult for guidance? I have added how my code looks like below.

Thank you in advance for any help or suggestions you can provide!

#include <iostream>
#include <string>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/filter/zlib.hpp> // Example filter (can be replaced with any filter)
#include <fstream>
#include <zstd.h>

namespace io = boost::iostreams;

int main() {
    // Create a filtering_istreambuf with a zlib decompressor
    io::filtering_streambuf<io::input> in;
    in.push(io::zstd_decompressor());
    std::ifstream file("compressed_file.txt.zst", std::ios_base::in | std::ios_base::binary);
    in.push(file);
    // Construct an istream from the underlying streambuf
    std::istream inputStream(&in);
    // Read lines from the input stream using std::getline()

    while (true) {

        std::string line;
        while (std::getline(inputStream, line)) {
            std::cout << "Line: " << line << std::endl;
        }

        if (inputStream.eof()) {
            inputStream.clear();
        }
    }
    
    return 0;
}

Upvotes: 1

Views: 117

Answers (1)

sehe
sehe

Reputation: 393664

I think decompression isn't your issue here. It's merely what manifests your issue. Even without decompression you'd run into the same problem: the stream stops at EOF.

It doesn't immediately look like iostreams has a facility like tail, but you can always use... tail:

Live On Coliru

#include <boost/iostreams/filter/gzip.hpp>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/process.hpp>
#include <iostream>
using namespace std::string_literals;
namespace bp = boost::process;
namespace io = boost::iostreams;

int main() {
    bp::ipstream monitor;
    bp::child    tail("/usr/bin/tail",                               //
                      std::vector{"-f"s, "compressed_file.txt.gz"s}, //
                      bp::std_in.null(),                             //
                      bp::std_out = monitor);

    io::filtering_istream filter;
    filter.push(io::gzip_decompressor{});
    filter.push(monitor);

    for (std::string line; getline(filter, line);)
        std::cout << "line: " << line << "\n";

    tail.wait();
}

Testing with

gzip -k main.cpp; mv main.cpp.gz compressed_file.txt.gz
g++ -std=c++17 -O2 -Wall -pedantic -pthread main.cpp -lz -lboost_iostreams 
./a.out

Prints the expected:

line: #include <boost/iostreams/filter/gzip.hpp>
line: #include <boost/iostreams/filtering_stream.hpp>
line: #include <boost/process.hpp>
line: #include <iostream>
line: using namespace std::string_literals;
line: namespace bp = boost::process;
line: namespace io = boost::iostreams;
line: 
line: int main() {
line:     bp::ipstream monitor;
line:     bp::child    tail("/usr/bin/tail",                        //
line:                       std::vector{"compressed_file.txt.gz"s}, //
line:                       bp::std_in.null(),                      //
line:                       bp::std_out = monitor);
line: 
line:     io::filtering_istream filter;
line:     filter.push(io::gzip_decompressor{});
line:     filter.push(monitor);
line: 
line:     for (std::string line; getline(filter, line);)
line:         std::cout << "line: " << line << "\n";
line: 
line:     tail.wait();
line: }

BONUS

Of course, dropping link dependencies on boost Boost IOstream and zlib consider:

#include <boost/process.hpp>
#include <iostream>
using namespace std::string_literals;
namespace bp = boost::process;

int main() {
    bp::pstream tail_out, unzip_out;
    bp::child                                       //
        tail("/usr/bin/tail",                       //
             std::vector{"-f"s},                    //
             bp::std_in < "compressed_file.txt.gz", //
             bp::std_out = tail_out),               //
        unzip("/usr/bin/zcat",                      //
              bp::std_in  = tail_out,               //
              bp::std_out = unzip_out);

    for (std::string line; getline(unzip_out, line);)
        std::cout << "line: " << line << "\n";

    return unzip.exit_code();
}

Still the same effect. Of course even simpler:

#include <boost/process.hpp>
#include <iostream>
namespace bp = boost::process;

int main() {
    bp::pstream stream;
    bp::child pipeline("tail -f | zcat", bp::shell, bp::std_in < "compressed_file.txt.gz", bp::std_out = stream);

    for (std::string line; getline(stream, line);)
        std::cout << "line: " << line << "\n";

    return pipeline.exit_code();
}

Printing

line: #include <boost/process.hpp>
line: #include <iostream>
line: namespace bp = boost::process;
line: 
line: int main() {
line:     bp::pstream stream;
line:     bp::child pipeline("tail | zcat", bp::shell, bp::std_in < "compressed_file.txt.gz", bp::std_out = stream);
line: 
line:     for (std::string line; getline(stream, line);)
line:         std::cout << "line: " << line << "\n";
line: 
line:     return pipeline.exit_code();
line: }

Upvotes: 0

Related Questions