nail fei
nail fei

Reputation: 2329

why BufferedReader.readLine can read a line which doesn't have a line separator

Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.------javadoc 1.8

Then I have a text file like this:

the first line
the second line

note: the last character of the seond line is 'e' that is to say there dont exist carriage return.

then here is my demo code.

public void process() throws IOException{
    BufferedReader br = new BufferedReader(new FileReader("demo.txt"));
    String line;
    while((line=br.readLine())!=null){
        System.out.println(line);
    }
    br.close();
}

the real output:

 the first line
 the second line

then my question is that why the readLine method can get the second line for it doesnt have line-separator (\n or \r or \n\r).
I know there exist a end of file (EOF), but it seemed the javadoc dont tell the EOF is also the line-separator explicitly.

If I use Scanner instead of BufferedReader the code as below:

public void testScan() throws IOException{
    Scanner scan = new Scanner(new FileInputStream("demo.txt"));
    String line;
    while((line=scan.nextLine())!=null){
        System.out.println(line);
    }
    scan.close();
}

then the output would be:

the first line
the second line
Exception in thread "main" java.util.NoSuchElementException: No line found
    at java.util.Scanner.nextLine(Scanner.java:1540)
    at com.demo.Demo.testScan(Demo.java:39)
    at com.demo.Demo.main(Demo.java:49)

Upvotes: 11

Views: 7129

Answers (2)

eis
eis

Reputation: 53563

it seemed the javadoc dont tell the EOF is also the line-separator explicitly.

I think you're confusing line separator with line terminator.

A line separator just separates lines from each other. Given a line separator ; and input one;two;three, you'd get lines one, two and three. However, given the same character and input but ; being a line terminator, you'd get lines one and two since the last line is not terminated.

In practice this means that if EOF really would be a line separator, you'd get extra data. As EOF is technically not a character but a condition that file has ended, having EOF as a line separator would have wild consequences.

However, given the javadoc:

Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.

I think terminology is misused also there. Either the javadoc should talk about separating instead of terminating, it should mention EOF as one of the conditions terminating the line or the implementation should not treat the last one as a separate line.

From Wikipedia:

Two ways to view newlines, both of which are self-consistent, are that newlines either separate lines or that they terminate lines. If a newline is considered a separator, there will be no newline after the last line of a file. Some programs have problems processing the last line of a file if it is not terminated by a newline. On the other hand, programs that expect newline to be used as a separator will interpret a final newline as starting a new (empty) line. Conversely, if a newline is considered a terminator, all text lines including the last are expected to be terminated by a newline. If the final character sequence in a text file is not a newline, the final line of the file may be considered to be an improper or incomplete text line, or the file may be considered to be improperly truncated.

So it does seem readLine() has these mixed up.

IMO readLine() javadoc should say something like:

A line is considered to be terminated at the end of the file or by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.

or for a bit more vague expression, similar to what Scanner.nextLine() says:

This method returns the [..] current line, excluding any line separator at the end

With the addition that it will return null when end of the file is the only input there is left.

Upvotes: 10

juhist
juhist

Reputation: 4314

Because it's programmed that way.

Really, it's what the user of the method wants. If the last line is missing a line separator at the end, it will read until EOF so that no data is lost. You don't want to lose an entire line because of a missing line separator.

Practically all similar functions work in the same way. For example, if you're looking at the fgets() function in the C library, it will also work that way. So does f.readline() in Python.

Edit: the Scanner works also in the similar way, but the difference is that a Scanner throws an exception whereas BufferedReader returns null when all lines have been read.

Upvotes: 10

Related Questions