the_faceless
the_faceless

Reputation: 19

Split strings to sentences and save punctuation mark at the end by regex java

So, I have text like this

String s = "The if-then-else statement provides a secondary path of execution when an "if" clause evaluates to false. You could use an if-then-else statement in the applyBrakes method to take some action if the brakes are applied when the bicycle is not in motion. In this case, the action is to simply print an error message stating that the bicycle has already stopped."

I need to split this string in Sentences but save punctuation mark at the end of Sentence, so I cant just use something like this:

s.split("[\\.|!|\\?|:] ");

Because if I use it I receive this:

The if-then statement is the most basic of all the control flow statements
It tells your program to execute a certain section of code only if a particular test evaluates to true
For example, the Bicycle class could allow the brakes to decrease the bicycle's speed only if the bicycle is already in motion
One possible implementation of the applyBrakes method could be as follows:

And I'm loosing my punctuation mark at the end, so how can I do it?

Upvotes: 1

Views: 1386

Answers (2)

Pshemo
Pshemo

Reputation: 124265

First of all your regex [\\.|!|\\?|:] represents . or | or ! or | or ? or | or : because you used character class [...]. You probably wanted to use (\\.|!|\\?|:) or probably better [.!?:] (I am not sure why you want : here, but it is your choice).

Next thing is that if you want to split on space and make sure that . or ! or ? or : character is before it but not consume this preceding character use look-behind mechanism, like

split("(?<=[.!?:])\\s")

But best approach would be using proper tool for splitting sentences, which is BreakIterator. You can find example of usage in this question: Split string into sentences based on periods

Upvotes: 5

Mena
Mena

Reputation: 48424

You can simply alternate a whitespace with the end of input in your pattern:

//                                          | your original punctuation class, 
//                                          | no need for "|" between items
//                                          | (that would include "|" 
//                                          |  as a delimiter)
//                                          | nor escapes, now that I think of it
//                                          |         | look ahead for:
//                                          |         | either whitespace
//                                          |         |     | or end
System.out.println(Arrays.toString(s.split("[.!?:](?=\\s|$)")));

That'll include the last chunk, and print (line breaks added for clarify):

[The if-then-else statement provides a secondary path of execution when an "if" clause evaluates to false,  
You could use an if-then-else statement in the applyBrakes method to take some action if the brakes are applied when the bicycle is not in motion,  
In this case, the action is to simply print an error message stating that the bicycle has already stopped]

Upvotes: 1

Related Questions