Jeff Wofford
Jeff Wofford

Reputation: 11547

Is the inconsistency of C++'s istream::eof() a bug in the spec or a bug in the implementation?

The following program demonstrates an inconsistency in the way that std::istream (specifically in my test code, std::istringstream) sets eof().

#include <sstream>
#include <cassert>

int main(int argc, const char * argv[])
{
    // EXHIBIT A:
    {
        // An empty stream doesn't recognize that it's empty...
        std::istringstream stream( "" );
        assert( !stream.eof() );        // (Not yet EOF. Maybe should be.)
        // ...until I read from it:
        const int c = stream.get();
        assert( c < 0 );                // (We received garbage.)
        assert( stream.eof() );         // (Now we're EOF.)
    }
    // THE MORAL: EOF only happens when actually attempting to read PAST the end of the stream.

    // EXHIBIT B:
    {
        // A stream that still has data beyond the current read position...
        std::istringstream stream( "c" );
        assert( !stream.eof() );        // (Clearly not yet EOF.)
        // ... clearly isn't eof(). But when I read the last character...
        const int c = stream.get();
        assert( c == 'c' );             // (We received something legit.)
        assert( !stream.eof() );        // (But we're already EOF?! THIS ASSERT FAILS.)
    }
    // THE MORAL: EOF happens when reading the character BEFORE the end of the stream.

    // Conclusion: MADNESS.
    return 0;
}

So, eof() "fires" when you read the character before the actual end-of-file. But if the stream is empty, it only fires when you actually attempt to read a character. Does eof() mean "you just tried to read off the end?" or "If you try to read again, you'll go off the end?" The answer is inconsistent.

Moreover, whether the assert fires or not depends on the compiler. Apple Clang 4.1, for example, fires the assertion (raises eof() when reading the preceding character). GCC 4.7.2, for example, does not.

This inconsistency makes it hard to write sensible loops that read through a stream but handle both empty and non-empty streams well.

OPTION 1:

while( stream && !stream.eof() )
{
    const int c = stream.get();    // BUG: Wrong if stream was empty before the loop.
    // ...
}

OPTION 2:

while( stream )
{
    const int c = stream.get();
    if( stream.eof() )
    {
        // BUG: Wrong when c in fact got the last character of the stream.
        break;
    }
    // ...
}

So, friends, how do I write a loop that parses through a stream, dealing with each character in turn, handles every character, but stops without fuss either when we hit the EOF, or in the case when the stream is empty to begin with, never starts?

And okay, the deeper question: I have the intuition that using peek() could maybe workaround this eof() inconsistency somehow, but...holy crap! Why the inconsistency?

Upvotes: 9

Views: 1556

Answers (5)

James Kanze
James Kanze

Reputation: 153909

It's not a bug, in the sense that it's the intended behavior. It is not the intent that you use test for eof() until after input has failed. It's main purpose is for use inside extraction functions, where in early implementations, the fact that std::streambuf::sgetc() returned EOF didn't mean that it would the next time it was called: the intent was that anytime sgetc() returned EOF (now std::char_traits<>::eof(), this would be memorized, and the stream would make no further calls to the streambuf.

Practically speaking: we really need two eof(): one for internal use, as above, and another which will reliably state that failure was due to having reached end of file. As it is, given something like:

std::istringstream s( "1.23e+" );
s >> aDouble;

There's no way of detecting that the error is due to a format error, rather than the stream not having any more data. In this case, the internal eof should return true (because we have seen end of file, when looking ahead, and we want to suppress all further calls to the streambuf extractor functions), but the external one should be false, because there was data present (even after skipping initial whitespace).

If you're not implementing an extractor function, of course, you should never test ios_base::eof() until you've actually had an input failure. It was never the intent that this would provide any useful information (which makes one wonder why they defined ios_base::good()—the fact that it returns false if eof() means that it can provide nor reliable information untin fail() returns true, at which point, we know that it will return false, so there's no point in calling it).

And I'm not sure what your problem is. Because the stream cannot know in advance what your next input will be (e.g. whether it will skip whitespace or not), it cannot know in advance whether your next input will fail because of end of file or not. The idiom adopted is clear: try the input, then test whether is succeeded or not. There is no other way, because no other alternative can be implemented. Pascal did it differently, but a file in Pascal was typed—you could only read one type from it, so it could always read ahead one element under the hood, and return end of file if this read ahead failed. Not having previsional end of file is the price we pay for being able to read more than one type from a file.

Upvotes: 1

n. m. could be an AI
n. m. could be an AI

Reputation: 119877

Never, ever check for eof alone.

The eof flag (which is the same as the eofbit bit flag in a value returned by rdstate()) is set when end-of-file is reached during an extract operation. If there were no extract operations, eofbit is never set, which is why your first check returns false.

However eofbit is no indication as to whether the operation was successful. For that, check failbit|badbit in rdstate(). failbit means "there was a logical error", and badbit means "there was an I/O error". Conveniently, there's a fail() function that returns exactly rdstate() & (failbit|badbit). Even more conveniently, there's an operator bool() function that returns !fail(). So you can do things like while(stream.read(buffer)){ ....

If the operation has failed, you may check eofbit, badbit and failbit separately to figure out why it has failed.

Upvotes: 5

Ben Voigt
Ben Voigt

Reputation: 283634

The behavior is somewhat subtle. eofbit is set when an attempt is made to read past the end of the file, but that may or may not cause failure of the current extraction operation.

For example:

ifstream blah;
// assume the file got opened
int i, j;
blah >> i;
if (!blah.eof())
    blah >> j;

If the file contains 142<EOF>, then the sequence of digits is terminated by end of file, so eofbit is set AND the extraction succeeds. Extraction of j won't be attempted, because the end of file has already been encountered.

If the file contains 142 <EOF>, the the sequence of digits is terminated by whitespace (extraction of i succeeds). eofbit is not set yet, so blah >> j will be executed, and it will reach end of file without finding any digits, so it will fail.

Notice how the innocuous-looking whitespace at the end of file changed the behavior.

Upvotes: 0

rici
rici

Reputation: 241711

What compiler / standard c++ library are you using? I tried it with gcc 4.6.3/4.7.2 and clang 3.1, and all of them worked just fine (i.e. the assertion does not fire).

I think you should report this as a bug in your tool-chain, since my reading of the standard accords with your intuition that eof() should not be set as long as get() is able to return a character.

Upvotes: 1

Dietmar K&#252;hl
Dietmar K&#252;hl

Reputation: 153830

The eof() flag is only useful to determine if you hit end of file after some operation. The primary use is to avoid an error message if reading reasonably failed because there wasn't anything more to read. Trying to control a loop or something using eof() is bound to fail. In all cases you need to check after you tried to read if the read was successful. Before the attempt the stream can't know what you are going to read.

The semantics of eof() is defined thoroughly as "this flag gets set when reading the stream caused the stream buffer to return a failure". It isn't quite as easy to find this statement if I recall correct but this is what comes down. At some point the standard also says that the stream is allowed to read more than it has to in some situation which may cause eof() to be set when you don't necessarily expect it. One such example is reading a character: the stream may end up detecting that there is nothing following that character and set eof().

If you want to handle an empty stream, it's trivial: look at something from the stream and proceed only if you know it's not empty:

if (stream.peek() != std::char_traits<char>::eof()) {
    do_what_needs_to_be_done_for_a_non_empty_stream();
}
else {
    do_something_else();
}

Upvotes: 9

Related Questions