SST
SST

Reputation: 2144

How do I split a paragraph into sentences

We are in the process of splitting a paragraph into sentences based on the dot.

String[] sentences = message.split("(?<=[.!?])\\s*");

The following sentence

HP E2B16UT Mini-tower Workstation - 1 x Intel Xeon E3-1245V3 3.40 GHz

is broken into

HP E2B16UT Mini-tower Workstation - 1 x Intel Xeon E3-1245V3 3
40 GHz

How should I avoid splitting on something like 3.40 GHz since we know it forms a word and its not a separator

Upvotes: 1

Views: 2885

Answers (3)

ratnesh
ratnesh

Reputation: 569

try this worked for me easy to understand

        String str = "This is how I tried to split a paragraph into a sentence. But, there is a problem. My paragraph includes dates like Jan 13, 2014 , words like U.S and numbers like 2.2. They all got splitted by the above code.";
    String[] sentenceHolder = str.split("[.?!][^A-Z0-9]");
    for (int i = 0; i < sentenceHolder.length; i++) {
        System.out.println(sentenceHolder[i]);
    }   

Upvotes: 0

ravibagul91
ravibagul91

Reputation: 20755

String message= "This is an example. This string is for split on '.'."//add a space after . for new sentence

Replace

 String[] sentences = message.split("(?<=[.!?])\\s*");

By

String[] sentences = message.split("(?<=[.!?])\\s* ");//add a space to split on new sentence

Upvotes: 0

Christian Tapia
Christian Tapia

Reputation: 34146

You could try this:

public static void main(String[] args) throws IOException
{
    String message = "HP E2B16UT Mini-tower Workstation - 1 x Intel Xeon E3-1245V3 3.40 GHz. Hello, you are welcome. StackOverflow. [email protected]";
    String[] sentences = message.split("(?<=[.!?])\\s* ");
    for (String s : sentences) {
        System.out.println(s);
    }
}

Output:

HP E2B16UT Mini-tower Workstation - 1 x Intel Xeon E3-1245V3 3.40 GHz.
Hello World.
StackOverflow.
[email protected]

Upvotes: 2

Related Questions