Reputation: 11915
I have a scanner set up that is working on an InputStream.
I am using Scanner.nextLine() to advance to each line, then doing some regular expression work on each line.
I have a regular expression that is basically like [\w\p{Z}]+?[;\n\r]
to pick up anything to the end of that line, or just ONE thing, if they are semi-colon delimited.
so if my InpustStream looks like
abcd;
xyz
It will pick up abcd;, but not xyz.
I think this is because scanner is consuming the newline character at the end of the line of text must be getting consumed somehow when the .nextLine() function is being called. Can anyone tell me how to fix this problem?
As an additional point of info, for my regex, i am compiling the pattern with Pattern.DOTALL
Thanks!
Upvotes: 5
Views: 49833
Reputation: 75222
Actually, you're the one that's causing the problem, by trying to consume a newline at the end of the last line. :-/ It's perfectly valid for the last line to end abruptly without a newline character, but your regex requires it to have one. You might be able to fix that by replacing the newline with an anchor or a lookahead, but there are much easier ways to go about this.
One is to override the default delimiter and iterate over the fields with next()
:
Scanner sc1 = new Scanner("abcd;\nxyz");
sc1.useDelimiter("[;\r\n]+");
while (sc1.hasNext())
{
System.out.printf("%s%n", sc1.next());
}
The other is to iterate over the lines with nextLine()
(using the default delimiter) and then split each line on semicolons:
Scanner sc2 = new Scanner("abcd;\nxyz");
while (sc2.hasNextLine())
for (String item : sc2.nextLine().split(";"))
{
System.out.printf("%s%n", item);
}
Scanner's API is one of the most bloated and unintuitive I've ever worked with, but you can greatly reduce the pain of using it if you remember these two crucial points:
split()
).nextXXX()
methods without first calling the corresponding hasNextXXX()
method.Upvotes: 7
Reputation: 71
So, why don't you add a newline to your nextLine()
result?
Isn't there a Regex-Special-Character ^
or $
that stands for the strings bounds?
Upvotes: 2
Reputation:
The API clearly specifies that next line removes any line separator nextLine()
you can do one of the various suggestions in the other replies. But also please notice that scanner has methods with "pattern". so if your regex is correct, you can use the following methods:
hasNext(Pattern pattern) or hasNext(String pattern) to find if you have more tokens
and then
next(Pattern pattern) or next(String pattern) to get the token if the above returned true.
Upvotes: 1
Reputation: 325
You can use \z
in your regex pattern to denote the end of the input, or $
for the end of the line. Furthermore, Scanner.nextLine()
by default returns the line without the newline character. Also, you could change the delimiters used by your Scanner
to include ;
with its useDelimiter
method. Lastly, your pattern might not do what you think it does as \p{Z}
only catches letters 'Z' judging by the documentation for Pattern
.
Upvotes: 0
Reputation: 5926
The regex character $
finds "the end of the pattern". Having said that since you don't have the end of the line character, it's easy to consume everything up until the first semi-colon; just consume everything other than semicolon:
[^;]+
Scanner
consumes the newline character as part of its behaviour because you don't usually want to deal with it, and it's system-dependent.
Edit: In a comment someone pointed out you could just use line.split(";")
and grab the first value. This would work too.
Upvotes: 1