Thanatos
Thanatos

Reputation: 44256

getline() sets failbit and skips last line

I'm using std::getline() to enumerate through the lines in a file, and it's mostly working. It's left me curious however - std::getline() is skipping the very last line in my file, but only if it's blank. Using this minimal example:

#include <iostream>
#include <string>

int main()
{
        std::string line;
        while(std::getline(std::cin, line))
                std::cout << "Line: “" << line << "”\n";
        return 0;
}

If I feed it this:

Line A
Line B
Line C

I get those lines back at me. But this:

Line A
Line B
Line C
[* line is present but blank, ie, the file end is: "...B\nLine C\n" *]

(I unfortunately can't have a blank line in SO's little code box thing...) So, first file has three lines ( ["Line A", "Line B", "Line C"] ), second file has four ( ["Line A", "Line B", "Line C", ""] )

This to me seems wrong - I have a four line file, and enumerating it with getline() leaves me with 3. What's really got me scratching my head is that this is exactly what the standard says it should do. (21.3.7.9)

Even Python has similar behaviour (but it gives me the newlines too - C++ chops them off.) Is this some weird thing where C++ is expected lines to be terminated, and not separated by '\n', and I'm feeding it differently?

Edit

Clearly, I need to expand a bit here. I've met up with two philosophies of determining what a "line" in a file is:

Of course, YMMV as to what a newline is.

I've always treated these as two completely different schools of thought. One earlier point I tried to make was to ask if the C++ standard was explicitly or merely implicitly following the first.

Thus, getting back to the question at hand, the second example, which can be thought of as "A\nB\nC\n" has four lines, following the separated philosophy. Now, does C++ explicitly follow a terminated philosophy, or is this just the way the standard is? (They don't record much reasoning in standards...) I'm hesitant to say it was explicit, since it's a bit painful to tell if you have what vim calls a "noeol" file with C++. (Python, for example, leaves the newlines in, so you can tell that way)

Since everything in Windows follows the separated philosophy, I'm looking for something a bit deeper than "Both examples have 3 lines."

(Curiously, where is Mac? terminated or separated?)

Upvotes: 3

Views: 2396

Answers (3)

Robᵩ
Robᵩ

Reputation: 168626

The C++ standard has this to say about getline:

C++ 2003, 21.3.7.9/5

[getline(is, str, delim)] … extracts characters from is … until any of the following occurs:

  • end-of-file occurs on the input sequence …
  • c == delim [N.b. default delim is '\n'] for the next available input character c (in which case, c is extracted but not appended)
  • str.max_size() characters are stored

Bracketd editorial comments added

To put it in your vernacular, getline treats '\n' as a terminator, not a separator.

Upvotes: 4

Ben Burnett
Ben Burnett

Reputation: 1564

When you say the last line is blank what do you mean? If you mean that the second to last line ends with a carriage return/line feed then you don't technically have a last line, and it sounds like getline() is behaving as I would expect it to.

Consider your example:

Line A
Line B
Line C

This is actually three lines that end in \r\n, and the third line's \r\n is what puts the cursor on the 4th line. There isn't actually a 4th line.

Upvotes: 0

Amardeep AC9MF
Amardeep AC9MF

Reputation: 19044

I count only three lines in both your data sets. The first data set is simply missing a line ending character which is present in the second data set.

Your editor represents an empty line after 'Line C' for convenience. If you pipe its contents through wc -l you will find it says 3.

Upvotes: 1

Related Questions