Reputation: 89
I'm using the following code in javascript to split string into phrases.
var result = str.match( /[^\n\.!\?\;:]+[\n\.!\?\;:]+/g );
let elements = result.map(element => element.trim());
elements = elements.filter(function (el) {return el != null && el != "";});
It works ok. My problem is when the string has numbers in the thousands marked with a dot that some people use like 1.500. How can alter this so that it only separates the phrases if the punctuation is followed by a space.
Upvotes: 2
Views: 117
Reputation: 626903
You can use
/(?:[^\n.!?;:]|[\n.!?;:](?!\s))+[\n.!?;:]+/g
See the regex demo. The point is that you either match any char other than the punctuation of your choice, or a punctuation not followed with a whitespace, one or more times, and then one or more punctuation symbols of your choice.
Details:
(?:
- start of a non-capturing group:
[^\n.!?;:]
- any char but a newline, .
, !
, ?
, ;
or :
|
- or
[\n.!?;:](?!\s)
- a newline, .
, !
, ?
, ;
or :
not followed with a whitespace)+
- one or more times[\n.!?;:]+
- one or more newline, .
, !
, ?
, ;
or :
chars.See a JavaScript demo:
var s = 'It works ok. My problem is when the string has numbers in the thousands marked with a dot that some people use like 1.500. How can alter this so that it only separates the phrases if the punctuation is followed by a space.';
var rx = /(?:[^\n.!?;:]|[\n.!?;:](?!\s))+[\n.!?;:]+/g;
console.log( s.match(rx) );
Upvotes: 2