Reputation: 2144
We are in the process of splitting a paragraph into sentences based on the dot.
String[] sentences = message.split("(?<=[.!?])\\s*");
The following sentence
HP E2B16UT Mini-tower Workstation - 1 x Intel Xeon E3-1245V3 3.40 GHz
is broken into
HP E2B16UT Mini-tower Workstation - 1 x Intel Xeon E3-1245V3 3
40 GHz
How should I avoid splitting on something like 3.40 GHz since we know it forms a word and its not a separator
Upvotes: 1
Views: 2885
Reputation: 569
try this worked for me easy to understand
String str = "This is how I tried to split a paragraph into a sentence. But, there is a problem. My paragraph includes dates like Jan 13, 2014 , words like U.S and numbers like 2.2. They all got splitted by the above code.";
String[] sentenceHolder = str.split("[.?!][^A-Z0-9]");
for (int i = 0; i < sentenceHolder.length; i++) {
System.out.println(sentenceHolder[i]);
}
Upvotes: 0
Reputation: 20755
String message= "This is an example. This string is for split on '.'."//add a space after . for new sentence
Replace
String[] sentences = message.split("(?<=[.!?])\\s*");
By
String[] sentences = message.split("(?<=[.!?])\\s* ");//add a space to split on new sentence
Upvotes: 0
Reputation: 34146
You could try this:
public static void main(String[] args) throws IOException
{
String message = "HP E2B16UT Mini-tower Workstation - 1 x Intel Xeon E3-1245V3 3.40 GHz. Hello, you are welcome. StackOverflow. [email protected]";
String[] sentences = message.split("(?<=[.!?])\\s* ");
for (String s : sentences) {
System.out.println(s);
}
}
Output:
HP E2B16UT Mini-tower Workstation - 1 x Intel Xeon E3-1245V3 3.40 GHz.
Hello World.
StackOverflow.
[email protected]
Upvotes: 2