Reputation: 167
I have a large paragraph string which I'm trying to split into sentences using JavaScript's .split()
method. I need a regex that will match a period or a question-mark [?.]
followed by a space. However, I need to retain the period/question-mark in the resulting array. How can I do this without positive lookbehinds in JS?
Edit: Example input:
"This is sentence 1. This is sentence 2? This is sentence 3."
Example output:
["This is sentence 1.", "This is sentence 2?", "This is sentence 3."]
Upvotes: 4
Views: 816
Reputation: 99011
I guess .match
will do it:
(?:\s?)(.*?[.?])
I.e.:
sentence = "This is sentence 1. This is sentence 2? This is sentence 3.";
result = sentence.match(/(?:\s?)(.*?[.?])/ig);
for (var i = 0; i < result.length; i++) {
document.write(result[i]+"<br>");
}
Upvotes: 0
Reputation: 108
Forget about split(). You want match()
var text = "This is an example paragragh. Oh and it has a question? Ok it's followed by some other random stuff. Bye.";
var matches = text.match(/[\w\s'\";\(\)\,]+(\.|\?)(\s|$)/g);
alert(matches);
The generated matches array contains each sentence:
Array[4]
0:"This is an example paragragh. "
1:"Oh and it has a question? "
2:"Ok it's followed by some other random stuff. "
4:"Bye. "
Here is the fiddle with it for further testing: https://jsfiddle.net/uds4cww3/
Edited to match end of line too.
Upvotes: 1
Reputation: 11042
This regex will work
([^?.]+[?.])(?:\s|$)
JS Demo
var str = 'This is sentence 1. This is sentence 2? This is sentence 3.';
var regex = /([^?.]+[?.])(?:\s|$)/gm;
var m;
while ((m = regex.exec(str)) !== null) {
document.writeln(m[1] + '<br>');
}
Upvotes: 1
Reputation: 44436
This is tacky, but it works:
var breakIntoSentences = function(s) {
var l = [];
s.replace(/[^.?]+.?/g, a => l.push(a));
return l;
}
breakIntoSentences("how? who cares.")
["how?", " who cares."]
(Really how it works: the RE matches a string of not-punctuation, followed by something. Since the match is greedy, that something is either punctuation or the end-of-string.)
This will only capture the first in a series of punctuation, so breakIntoSentences("how???? who cares...")
also returns ["how?", " who cares."]
. If you want to capture all the punctuation, use /[^.?]+[.?]*/g
as the RE instead.
Edit: Hahaha: Wavvves teaches me about match()
, which is what the replace/push does. You learn something knew every goddamn day.
In its minimal form, supporting three punctuation marks, and using ES6 syntax, we get:
const breakIntoSentences = s => s.match(/[^.?,]+[.?,]*/g)
Upvotes: 0