Reputation: 10974
This is a textarea where the user writes some text. I've written an example in it.
<textarea id="text">First sentence. Second sentence? Third sentence!
Fourth sentence.
Fifth sentence
</textarea>
Requirements already considered in the regex
Missing requirement (I need help with this) <<
Each new line should be represented by an empty array item. If the regex is applied, this should be the response:
["First sentence.", "Second sentence?", "Third sentence!", "", "Fourth sentence.", "", "", "Fifth sentence"]
Instead, I'm receiving this:
["First sentence.", "Second sentence?", "Third sentence!", "Fourth sentence.", "Fifth sentence"]
This is the regex and match call:
var tregex = /[^\r\n.!?]+(:?(:?\r\n|[\r\n]|[.!?])+|$)/gi;
var sentences = $('#text').val().match(tregex).map($.trim);
Any ideas? Thanks!
Upvotes: 0
Views: 2468
Reputation: 4423
I simplified it a lot, either match the end of a line (new line) or a sentence followed by punctuation:
var tregex = /\n|([^\r\n.!?]+([.!?]+|$))/gim;
I also believe the m
flag for multiline is important
Upvotes: 2
Reputation: 4049
You can use the following regex:
/((?:\S[^\.\?\!]*)[\.\?\!]*)/g
Lets break this down:
"g" is for flag for global match, meaning keep matching after the first occurrence
Working from the inside out, (?:) is a delimiter that allows us to group an expression, but discard the matched result from the output. We are matching \S (non-whitespace) that does not contain a period, question mark, or exclamation point.
You stated you wanted to keep this punctuation, so the next part following the match [.\?!] is a series which contains these same punctuation symbols so they are included in the outer delimiters. EDIT: I added the asterisk for this to include any number of punctuation, or none at all at the end of a sentence.
Check out the matched groups using http://www.pagecolumn.com/tool/regtest.htm, or a similar Javascript regex tester.
Upvotes: 1