AndrWeisR
AndrWeisR

Reputation: 1226

How to ignore some of Java's Files.lines end-of-line delimiters

Java's Files.lines method reads all lines from a file as a Stream, breaking the file into lines at the following delimiters:

\u000D followed by \u000A, CARRIAGE RETURN followed by LINE FEED
\u000A, LINE FEED
\u000D, CARRIAGE RETURN

I have files that contain the odd occurrence of \u000D, CARRIAGE RETURN which I do not want to treat as a new line, to be consistent with the way that grep (Windows) doesn't treat just a single \u000D as a newline marker. I want to process the lines in the file as a stream, but is there a way I can get a stream that doesn't use a single \u000D as a newline marker, using just CR/LF or LF? I have to use Java 8.

My problem is that I am getting grep to return the line number with its matches, but because of the difference in EOL delimiters, Files.lines.skip(numLines) doesn't then align with the same line if I try to skip to the line number returned by grep.

Upvotes: 0

Views: 466

Answers (2)

Stephen C
Stephen C

Reputation: 718768

Lets assume that you are doing byte-wise input ...

A scalable / efficient solution avoids holding the entire file in memory, and / or creating a string object for each line of input that you skip. This is one way to do it.

File f = ...
InputStream is = new BufferedInputStream(new FileInputStream(f));
int lineCounter = 1;
int wantedLine = 42;
int b = 0;
while (lineCounter < wantedLine && b != -1) {
    do {
        b = is.read();
        if (b == '\n') {
            lineCount++;
        }
    } while (b != -1 && b != '\n');
}
if (lineCounter == wantedLine) {
    // do stuff
}

Notes:

  1. I know this is a bit clunky. And it would be possible to do away with the nested loop ... but this code is intended to be "illustrative" of an approach.
  2. You could possibly get better performance by using ByteBuffer, but it makes the code more complicated. (If you are unfamiliar with the Buffer APIs.)
  3. You could do something similar with a BufferedReader.
  4. For production quality code, you should use try with resources to manage the InputStream resource.

Upvotes: 1

user4910279
user4910279

Reputation:

Try this.

Stream.of(Files.readString(path).split("\r?\n"))
    .filter(...

Upvotes: 0

Related Questions