user1344280
user1344280

Reputation:

Regular expression to match end punctuation taking abbreviations into account

I'm trying to construct a regular expression that would match the punctuation characters in a sentence. I want this regular expression to avoid matching the periods belonging to abbreviations. Example sentence:

To get more info, help, etc. read through this manual.

In this sentence the regular expression should match all commas and periods except for the "etc." one. To achieve this I have a list of common English abbreviations and the current state of my regular expression is (I have shortened the list of abbreviations for clarity):

(?i)((?<!a\.d|a\.m|abbr|adj|adv|al|etc)(\.)|[,;:!?])$

This regular expression is run against each word, the sentence is previously split by spaces. The problem with this approach is that actually is not skipping the abbreviations as "a whole", in fact, the dot after "manual" is not matched because it ends with "al" which is in the list of abbreviations. How can I modify the expression to match the end dot only if the whole word is not on the list of abbreviations?

Upvotes: 0

Views: 961

Answers (1)

mattman
mattman

Reputation: 365

Inserting a \b will cause a match only on word boundaries.

For example, (?i)((?<!((\b)(a\.d|a\.m|abbr|adj|adv|al|etc)))(\.)|[,;:!?])$

Upvotes: 2

Related Questions