Ricardo Mota
Ricardo Mota

Reputation: 89

How to split string by punctuation that ignores the period in numbers

I'm using the following code in javascript to split string into phrases.

var result = str.match( /[^\n\.!\?\;:]+[\n\.!\?\;:]+/g );
let elements = result.map(element => element.trim());
elements = elements.filter(function (el) {return el != null && el != "";});

It works ok. My problem is when the string has numbers in the thousands marked with a dot that some people use like 1.500. How can alter this so that it only separates the phrases if the punctuation is followed by a space.

Upvotes: 2

Views: 117

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626903

You can use

/(?:[^\n.!?;:]|[\n.!?;:](?!\s))+[\n.!?;:]+/g

See the regex demo. The point is that you either match any char other than the punctuation of your choice, or a punctuation not followed with a whitespace, one or more times, and then one or more punctuation symbols of your choice.

Details:

  • (?: - start of a non-capturing group:
    • [^\n.!?;:] - any char but a newline, ., !, ?, ; or :
  • | - or
    • [\n.!?;:](?!\s) - a newline, ., !, ?, ; or : not followed with a whitespace
  • )+ - one or more times
  • [\n.!?;:]+ - one or more newline, ., !, ?, ; or : chars.

See a JavaScript demo:

var s = 'It works ok. My problem is when the string has numbers in the thousands marked with a dot that some people use like 1.500. How can alter this so that it only separates the phrases if the punctuation is followed by a space.';
var rx = /(?:[^\n.!?;:]|[\n.!?;:](?!\s))+[\n.!?;:]+/g;
console.log( s.match(rx) );

Upvotes: 2

Related Questions