Derek
Derek

Reputation: 11915

Java Scanner.nextLine() consumes newline character

I have a scanner set up that is working on an InputStream.

I am using Scanner.nextLine() to advance to each line, then doing some regular expression work on each line.

I have a regular expression that is basically like [\w\p{Z}]+?[;\n\r] to pick up anything to the end of that line, or just ONE thing, if they are semi-colon delimited.

so if my InpustStream looks like

abcd;
xyz

It will pick up abcd;, but not xyz.

I think this is because scanner is consuming the newline character at the end of the line of text must be getting consumed somehow when the .nextLine() function is being called. Can anyone tell me how to fix this problem?

As an additional point of info, for my regex, i am compiling the pattern with Pattern.DOTALL

Thanks!

Upvotes: 5

Views: 49833

Answers (5)

Alan Moore
Alan Moore

Reputation: 75222

Actually, you're the one that's causing the problem, by trying to consume a newline at the end of the last line. :-/ It's perfectly valid for the last line to end abruptly without a newline character, but your regex requires it to have one. You might be able to fix that by replacing the newline with an anchor or a lookahead, but there are much easier ways to go about this.

One is to override the default delimiter and iterate over the fields with next():

Scanner sc1 = new Scanner("abcd;\nxyz");
sc1.useDelimiter("[;\r\n]+");
while (sc1.hasNext())
{
  System.out.printf("%s%n", sc1.next());
}

The other is to iterate over the lines with nextLine() (using the default delimiter) and then split each line on semicolons:

Scanner sc2 = new Scanner("abcd;\nxyz");
while (sc2.hasNextLine())
for (String item : sc2.nextLine().split(";"))
{
  System.out.printf("%s%n", item);
}

Scanner's API is one of the most bloated and unintuitive I've ever worked with, but you can greatly reduce the pain of using it if you remember these two crucial points:

  1. Think in terms of matching the delimiters, not the fields (like you do with String's split()).
  2. Never call one of the nextXXX() methods without first calling the corresponding hasNextXXX() method.

Upvotes: 7

user1025189
user1025189

Reputation: 71

So, why don't you add a newline to your nextLine() result?

Isn't there a Regex-Special-Character ^ or $ that stands for the strings bounds?

Upvotes: 2

user890904
user890904

Reputation:

The API clearly specifies that next line removes any line separator nextLine()

you can do one of the various suggestions in the other replies. But also please notice that scanner has methods with "pattern". so if your regex is correct, you can use the following methods:

hasNext(Pattern pattern) or hasNext(String pattern) to find if you have more tokens

and then

next(Pattern pattern) or next(String pattern) to get the token if the above returned true.

Upvotes: 1

fredo
fredo

Reputation: 325

You can use \z in your regex pattern to denote the end of the input, or $ for the end of the line. Furthermore, Scanner.nextLine() by default returns the line without the newline character. Also, you could change the delimiters used by your Scanner to include ; with its useDelimiter method. Lastly, your pattern might not do what you think it does as \p{Z} only catches letters 'Z' judging by the documentation for Pattern.

Upvotes: 0

Calum
Calum

Reputation: 5926

The regex character $ finds "the end of the pattern". Having said that since you don't have the end of the line character, it's easy to consume everything up until the first semi-colon; just consume everything other than semicolon:

[^;]+

Scanner consumes the newline character as part of its behaviour because you don't usually want to deal with it, and it's system-dependent.

Edit: In a comment someone pointed out you could just use line.split(";") and grab the first value. This would work too.

Upvotes: 1

Related Questions