wlyles
wlyles

Reputation: 2316

Issues with characters at the end of an istream

I'm writing a parser, and I was previously having trouble when I try to parse identifiers (anything that's valid for a C++ variable name) and unclosed string literals (anything starting with ", but missing the closing ") at the end of my input. I think it's because the lexer (TokenStream) uses std::noskipws in these cases and builds the token character by character. Here is where I believe I have narrowed down the problem (shown only for one of the two cases, as the other is very similar logic):

std::string TokenStream::get()
{
    char c;
    (*input) >> c; // input is of type istream*

    // other cases...

    if (c == '"')
    {
        std::string s = stringFromChar(c); // just makes a string from the char.
        char d;
        while (true) // 1)
        {
            (*input) >> std::noskipws >> d;
            std::cout << d; // 2)
            if (d == '"')
            {
                s += d;
                (*input) >> std::skipws;
                break; 
            }
            s += d;
        }
        return s;
    }

    // other cases...
}

Note that this function is supposed to just generate tokens from the input in a stream-like fashion. Now, if I input either a literal (like asdf) or an unclosed string (like "asdf), then the program will hang, and the line marked 2) will just output the last character of the input (in my examples, f) over and over again forever.

I've solved this problem by using a check for input->eof(), but my question is this:

Why does the loop (marked 1) in comments) keep executing when I hit the end of stream, and why does it just print that last character read every time through the loop?

Upvotes: 0

Views: 212

Answers (1)

Csq
Csq

Reputation: 5845

Lets look at the loop in question line-by-line

    while (true) // 1)

That's gonna loop, unless a break is encountered

    {
        (*input) >> std::noskipws >> d;

Read a character. If can't read character, d is likely to be unchanged.

        std::cout << d; // 2)

Print the character that is just read

        if (d == '"')

Nope, the last character was not " (as specified in the question)

        {
            s += d;
            (*input) >> std::skipws;
            break; 
        }
        s += d;
    }

Therefore the break is never encountered and the last character is printed in an endless loop.


Fix: always use a while look like this for input:

char ch;
while (input >> ch) {

    // ch contains a new letter, deal with it

}

Upvotes: 1

Related Questions