Andres SK
Andres SK

Reputation: 10974

Regex that splits long text in separate sentences with match()

This is a textarea where the user writes some text. I've written an example in it.

<textarea id="text">First sentence. Second sentence? Third sentence!
Fourth sentence.

Fifth sentence
</textarea>

Requirements already considered in the regex

Missing requirement (I need help with this) <<

Each new line should be represented by an empty array item. If the regex is applied, this should be the response:

["First sentence.", "Second sentence?", "Third sentence!", "", "Fourth sentence.", "", "", "Fifth sentence"]

Instead, I'm receiving this:

["First sentence.", "Second sentence?", "Third sentence!", "Fourth sentence.", "Fifth sentence"]

This is the regex and match call:

var tregex = /[^\r\n.!?]+(:?(:?\r\n|[\r\n]|[.!?])+|$)/gi;
var sentences = $('#text').val().match(tregex).map($.trim);

Any ideas? Thanks!

Upvotes: 0

Views: 2468

Answers (2)

matt3141
matt3141

Reputation: 4423

I simplified it a lot, either match the end of a line (new line) or a sentence followed by punctuation:

var tregex = /\n|([^\r\n.!?]+([.!?]+|$))/gim;

I also believe the m flag for multiline is important

Upvotes: 2

Ben Simpson
Ben Simpson

Reputation: 4049

You can use the following regex:

/((?:\S[^\.\?\!]*)[\.\?\!]*)/g

Lets break this down:

"g" is for flag for global match, meaning keep matching after the first occurrence

Working from the inside out, (?:) is a delimiter that allows us to group an expression, but discard the matched result from the output. We are matching \S (non-whitespace) that does not contain a period, question mark, or exclamation point.

You stated you wanted to keep this punctuation, so the next part following the match [.\?!] is a series which contains these same punctuation symbols so they are included in the outer delimiters. EDIT: I added the asterisk for this to include any number of punctuation, or none at all at the end of a sentence.

Check out the matched groups using http://www.pagecolumn.com/tool/regtest.htm, or a similar Javascript regex tester.

Upvotes: 1

Related Questions