Reputation: 4302
I have a regex to split a paragraph into sentences:
var sentences = /[^\.!\?]+[\.!\?]+/g;
I would like it only to match if the punctuation ([\.!\?]+
) has a whitespace \s
after it. I tried /[^\.!\?]+[\.!\?]+\s/g
, but that did not work.
The reason why I want this is because currently if there is a word with punctuation in the middle (like about.me
) it is splitting it there like the .
represents the end of a sentence when it does not. Any ideas?
For example:
If I have this paragraph:
If the problem being solved isn't as apparent or immediately useful as traffic about.me and navigation data: weather. A few apps are trying to harness the crowd to provide accurate?
I want it to only split into
['If the problem being solved isn't as apparent or immediately useful as traffic about.me and navigation data: weather.', 'A few apps are trying to harness the crowd to provide accurate?']
whereas currently it splits into
['If the problem being solved isn't as apparent or immediately useful as traffic about.', 'me and navigation data: weather.', 'A few apps are trying to harness the crowd to provide accurate?']
.
Upvotes: 0
Views: 115
Reputation: 4968
Is this what you want?
var str = "If the problem being solved isn't as apparent or immediately useful as traffic about.me and navigation data: weather. A few apps are trying to harness the crowd to provide accurate?";
str.match(/.+?(\.|\?)(\s|$)/g);
Upvotes: 2
Reputation: 6562
Use lookahead:
var re = /[\.!\?]+(?=\s)/g;
var result = "If the problem being solved isn't as apparent or immediately useful as traffic about.me and navigation data: weather. A few apps are trying to harness the crowd to provide accurate?".split(re);
console.log(result.length); // => 2
Upvotes: 1