jswny
jswny

Reputation: 167

Regex Match Punctuation Space but Retain Punctuation

I have a large paragraph string which I'm trying to split into sentences using JavaScript's .split() method. I need a regex that will match a period or a question-mark [?.] followed by a space. However, I need to retain the period/question-mark in the resulting array. How can I do this without positive lookbehinds in JS?

Edit: Example input: "This is sentence 1. This is sentence 2? This is sentence 3." Example output: ["This is sentence 1.", "This is sentence 2?", "This is sentence 3."]

Upvotes: 4

Views: 816

Answers (5)

Pedro Lobito
Pedro Lobito

Reputation: 99011

I guess .match will do it:

(?:\s?)(.*?[.?])

I.e.:

sentence = "This is sentence 1. This is sentence 2? This is sentence 3.";
result = sentence.match(/(?:\s?)(.*?[.?])/ig);
for (var i = 0; i < result.length; i++) {
   document.write(result[i]+"<br>");
}

Upvotes: 0

Paulo Arromba
Paulo Arromba

Reputation: 108

Forget about split(). You want match()

var text = "This is an example paragragh. Oh and it has a question? Ok it's followed by some other random stuff. Bye.";

var matches = text.match(/[\w\s'\";\(\)\,]+(\.|\?)(\s|$)/g);


alert(matches);

The generated matches array contains each sentence:

    Array[4]
        0:"This is an example paragragh. "
        1:"Oh and it has a question? "
        2:"Ok it's followed by some other random stuff. "
        4:"Bye. "

Here is the fiddle with it for further testing: https://jsfiddle.net/uds4cww3/

Edited to match end of line too.

Upvotes: 1

rock321987
rock321987

Reputation: 11042

This regex will work

([^?.]+[?.])(?:\s|$)

Regex Demo

JS Demo

Ideone Demo

var str = 'This is sentence 1. This is sentence 2? This is sentence 3.';
var regex = /([^?.]+[?.])(?:\s|$)/gm;
var m;

while ((m = regex.exec(str)) !== null) {
    document.writeln(m[1] + '<br>');
}

Upvotes: 1

Michael Lorton
Michael Lorton

Reputation: 44436

This is tacky, but it works:

var breakIntoSentences = function(s) {
  var l = [];
  s.replace(/[^.?]+.?/g, a => l.push(a));
  return l;
}

breakIntoSentences("how? who cares.")
["how?", " who cares."]

(Really how it works: the RE matches a string of not-punctuation, followed by something. Since the match is greedy, that something is either punctuation or the end-of-string.)

This will only capture the first in a series of punctuation, so breakIntoSentences("how???? who cares...") also returns ["how?", " who cares."]. If you want to capture all the punctuation, use /[^.?]+[.?]*/g as the RE instead.

Edit: Hahaha: Wavvves teaches me about match(), which is what the replace/push does. You learn something knew every goddamn day.

In its minimal form, supporting three punctuation marks, and using ES6 syntax, we get:

const breakIntoSentences = s => s.match(/[^.?,]+[.?,]*/g)

Upvotes: 0

Redu
Redu

Reputation: 26191

May be this one validates your array items

\b.*?[?\.](?=\s|$)

Regular expression visualization

Debuggex Demo

Upvotes: 0

Related Questions