Reputation: 1168
Please, help me to understand the reasons for the difference in the behavior of the following program.
The program creates a test text file and a chain of boost filters (filtering_istream
) from one source and one filter. Then it tries to read some lines.
#include <iostream>
#include <fstream>
#include <string>
#include <boost/iostreams/device/file_descriptor.hpp>
#include <boost/iostreams/filtering_stream.hpp>
class my_filter : public boost::iostreams::input_filter
{
public:
explicit my_filter(std::ostream& s) : m_out(s)
{}
template<typename Source>
int get(Source& src)
{
int c = boost::iostreams::get(src);
if(c == EOF || c == boost::iostreams::WOULD_BLOCK)
return c;
if(c == '\r')
return boost::iostreams::WOULD_BLOCK;
if(c == '\n')
{
m_out << m_str << std::endl;
m_str = "";
}
else
{
m_str += c;
}
return c;
}
private:
std::ostream& m_out;
std::string m_str;
};
int main()
{
using namespace std;
boost::iostreams::filtering_istream file;
const std::string fname = "test.txt";
std::ofstream f(fname, ios::out);
f << "Hello\r\n";
f << "11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111\r\n";
f << "World!\r\n";
f.close();
file.push(my_filter(std::cout));
file.push(boost::iostreams::file_descriptor(fname));
std::string s;
while(std::getline(file, s))
{}
return 0;
}
Online compilation with clang displays the expected result:
But if I change the string "111...111" (128 ones) to 127 ones (255 and so on), the result differs:
This behavior seems incorrect to me.
Note: the length of "111...111" (127 ones) correlates with default buffer_size in boost::iostreams::filtering_istream::push
method...
file.push(my_filter(std::cout), default_buf_size=...)
You may see and run code here: code_example
I find it strange that under some conditions, the return value of WOULD_BLOCK
allows you to read further, and under other conditions, it considers that the file is finished. But according to documentation:
WOULD_BLOCK - indicates that input is temporarily unavailable
So it doesn’t indicate the end of stream.
Upvotes: 2
Views: 2104
Reputation: 6866
What are you trying to do with this part?
if(c == '\r')
return boost::iostreams::WOULD_BLOCK;
If you're trying to ignore \r
characters then you should just skip over them and read another character from the source. The Boost docs have an example that shows exactly this:
#include <ctype.h> // isalpha
#include <cstdio.h> // EOF
#include <boost/iostreams/categories.hpp> // input_filter_tag
#include <boost/iostreams/operations.hpp> // get, WOULD_BLOCK
using namespace std;
using namespace boost::iostreams;
struct alphabetic_input_filter {
typedef char char_type;
typedef input_filter_tag category;
template<typename Source>
int get(Source& src)
{
int c;
while ( (c = boost::iostreams::get(src)) != EOF &&
c != WOULD_BLOCK &&
!isalpha((unsigned char) c) )
;
return c;
}
};
This removes all non-alphabetic characters from the source (see https://www.boost.org/doc/libs/1_68_0/libs/iostreams/doc/concepts/input_filter.html).
Now as to why exactly you're seeing the behavior above, this is what's basically happening:
you are returning WOULD_BLOCK
from get()
exactly on a buffer boundary, before any characters were set in the current buffer
this gets called from a read()
implementation that looks like this (see the two lines with comments towards the end):
template<>
struct read_filter_impl<any_tag> {
template<typename T, typename Source>
static std::streamsize read
(T& t, Source& src, typename char_type_of<T>::type* s, std::streamsize n)
{ // gcc 2.95 needs namespace qualification for char_traits.
typedef typename char_type_of<T>::type char_type;
typedef iostreams::char_traits<char_type> traits_type;
for (std::streamsize off = 0; off < n; ++off) {
typename traits_type::int_type c = t.get(src);
if (traits_type::is_eof(c))
return check_eof(off);
if (traits_type::would_block(c)) // It gets HERE
return off; // and returns 0
s[off] = traits_type::to_char_type(c);
}
return n;
}
(https://www.boost.org/doc/libs/1_70_0/boost/iostreams/read.hpp)
// Read from source.
std::streamsize chars =
obj().read(buf.data() + pback_size_, buf.size() - pback_size_, next_);
if (chars == -1) {
this->set_true_eof(true);
chars = 0;
}
setg(eback(), gptr(), buf.data() + pback_size_ + chars);
return chars != 0 ?
traits_type::to_int_type(*gptr()) :
traits_type::eof();
(https://www.boost.org/doc/libs/1_70_0/boost/iostreams/detail/streambuf/indirect_streambuf.hpp)
So because it hasn't read any characters in the current buffer, it interprets this as end of file and gives up completely.
(too long for a comment)
You are not the user of that function, that function is actually the user of your code. Your filter comes between the source and the reader. WOULD_BLOCK
does indicate that input is temporary unavailable, but it is ultimately the decision of the reader if and when it tries again. The streambuf
does the best it can in this regard, it processes whatever it managed to get from the source and then tries to read again (that's why it doesn't stop the first time you return WOULD_BLOCK
, after Hello
). But if the source doesn't return anything more and the streambuf
buffer is empty, it basically has no choice but to consider it reached the end of the source. It has no more characters to process in the buffer and can't get more from the source.
You will see the same behavior if you put two consecutive \r
anywhere. Try this for example:
f << "Hello\r\r\n";
f << "nothing\r\n";
f << "World!\r\n";
Note the two \r
s after Hello
. And this has nothing to do with buffer sizes. It's just a read from the source that doesn't return anything.
Upvotes: 3