user2392965
user2392965

Reputation: 445

Java Scanner Split Strings by Sentences

I am trying to split a paragraph of text into separate sentences based on punctuation marks i.e. [.?!] However, the scanner splits the lines at the end of each new line as well, even though I've specified a particular pattern. How do I resolve this? Thanks!

this is a text file. yes the
deliminator works
no it does not. why not?

Scanner scanner = new Scanner(fileInputStream);
scanner.useDelimiter("[.?!]");
while (scanner.hasNext()) {
  line = scanner.next();
  System.out.println(line);
}

Upvotes: 1

Views: 2097

Answers (1)

mrzli
mrzli

Reputation: 17369

I don't believe the scanner splits it on line breaks, it is just your "line" variables have line breaks in them and that is why you get that output. For example, you can replace those line breaks with spaces:

(I am reading the same input text you supplied from a file, so it has some extra file reading code, but you'll get the picture.)

try {
    File file = new File("assets/test.txt");
    Scanner scanner = new Scanner(file);
    scanner.useDelimiter("[.?!]");
    while (scanner.hasNext()) {
        String sentence = scanner.next();
        sentence = sentence.replaceAll("\\r?\\n", " ");
        // uncomment for nicer output
        //line = line.trim();
        System.out.println(sentence);
    }
    scanner.close();
} catch (FileNotFoundException e) {
    e.printStackTrace();
}

This is the result:

this is a text file
 yes the deliminator works no it does not
 why not

And if I uncomment the trim line, it's a bit nicer:

this is a text file
yes the deliminator works no it does not
why not

Upvotes: 5

Related Questions