Reputation: 105
I have a text file and wish to print the first sentence of each paragraph. Paragraphs are separated by a line break, i.e. "\n".
From the BreakIterator, I thought I could use the getLineInstance() for this but it seems it is iterator over each word:
public String[] extractFirstSentences() {
BreakIterator boundary = BreakIterator.getLineInstance(Locale.US);
boundary.setText(getText());
List<String> sentences = new ArrayList<String>();
int start = boundary.first();
int end = boundary.next();
while (end != BreakIterator.DONE) {
String sentence = getText().substring(start, end).trim();
if (!sentence.isEmpty()) {
sentences.add(sentence);
}
start = end;
end = boundary.next();
}
return sentences.toArray(new String[sentences.size()]);
Am I using getLineInstance() incorrectly or is there another method to do what I want?
Upvotes: 2
Views: 2729
Reputation: 54796
How about this as an alternative:
public String[] extractFirstSentences() {
String myText = getText();
String[] paragraphs = myText.split("\\n");
List<String> result = new ArrayList<String>();
for (String paragraph : paragraphs) {
result.add(paragraph.split("[\\.\\?\\!][\\r\\n\\t ]+")[0] + ".");
}
return result.toArray(new String[result.size()]);
}
Upvotes: 2