wyc
wyc

Reputation: 55293

Don't match if there is an extra period at the end of a sentence

I'm using this regex:

\n\n|(?<=[^,][."*?!\^])(?!\.)[ ](?=["*]?[A-Z])

To match the empty space after a sentence (I'm using [ ] to show you the matches):

First sentence.[ ]Middle a sentence..[ ]Last sentence.

First sentence?[ ]Middle a sentence!.[ ]Last sentence.

"First sentence," Middle a sentence..[ ]Last sentence.

"First sentence."[ ]"Middle a sentence.".[ ]Last sentence.

It's worked fine so far. But now I want the regex not to match when there are .., ?., !., ."., etc. at the end of a sentence. In other words, when there's an extra period:

First sentence.[ ]Middle a sentence.. Last sentence.

First sentence?[ ]Middle a sentence!. Last sentence.

"First sentence," Middle a sentence.. Last sentence.

"First sentence."[ ]"Middle a sentence.". Last sentence.

I thought adding an (?!\.) would do the trick:

\n\n|(?<=[^,][."*?!\^])(?!\.)[ ](?=["*]?[A-Z])

But as you can see here, it's still matching the empty spaces after the extra period.

Why is this and how to fix it?

Upvotes: 0

Views: 75

Answers (1)

trincot
trincot

Reputation: 350776

You can extend this negated class [^,] with all other characters that you don't want to occur as one-before-last character in a sentence.

So like: [^,.]

If however you want to allow ." to end a sentence, and only want to avoid two points (or !. and ".), then put your look around condition there as look-back:

(?<=[^,][."*?!\^])(?<![.!?"]\.)

Upvotes: 1

Related Questions