Reputation: 10848
Does anyone know how the Scanner's .next() method treats punctuation? I couldn't find the answer to this anywhere. I have a program that's reading each word in from a text file and I am unsure of how it treats parts like "that's" or "they are," or "her."
For periods and commas, are they counted as a separate entity or are they considered part of the word if it occurs like "her." or "her,"? Depending on what it does, are "her" and "her." or "her" and "her," considered two different words by the Scanner?
For apostrophes, do they get accounted for or do they effectively split the word in two? For example, would "they're" be recognized as "they" "'" "re" or would it be recognized as "they're" altogether?
I hope I came across clearly on this question.
Upvotes: 0
Views: 8599
Reputation: 116286
I didn't know (only guessed), so I've tried it myself:
String input = "That's what they are, I told her. She said, it ain't so!";
Scanner s = new Scanner(input); // default delimiter is whitespaces
while (s.hasNext()) {
System.out.println(s.next());
}
Output:
That's
what
they
are,
I
told
her.
She
said,
it
ain't
so!
Upvotes: 0
Reputation: 72049
The default delimiter for Scanner
is whitespaces. So none of the examples you provided will be split. Why not try it yourself though?
String input = "That's a they are, her. They're here.";
Scanner scanner = new Scanner(input);
while (scanner.hasNext()) {
System.out.println(scanner.next());
}
If you did happen to want to split on something like '
and whitespaces you would use something like:
Scanner scanner = new Scanner(input).useDelimiter("[\\s']");
Upvotes: 0
Reputation: 30733
Scanner has a useDelimiter
method which lets you specify which characters will be considered as 'word breakers'. The default delimiter is the whitespace pattern (so punctuations symbols will be included in the word)
Upvotes: 2