Meidan Alon
Meidan Alon

Reputation: 3094

Regex for matching groups but excluding a specific combination of groups

I'm trying to match two groups in an expression, each group represents a single letter in initials as part of a name, for example in George R. R. Martin the first group would match the first R and the second group would match the second R, I have something like this:

\b([a-zA-Z])[\.{0,1} {0,1}]{1,2}([a-zA-Z])\b

However, I'd like to exclude a specific combination of those groups, say when the first group matches the letter d and the second group matches the letter r.

Is that possible?

Upvotes: 1

Views: 2348

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626794

You may restrict matches with a negative lookahead:

\b(?![dD]\.? ?[rR]\b)([a-zA-Z])\.? ?([a-zA-Z])\b
  ^^^^^^^^^^^^^^^^^^^ 

See the regex demo

Note:

  • The (?![dD]\.? ?[rR]\b) lookahead should be better placed after the word boundary, so that the check only gets triggered upon encountering a word boundary, not at every location in string
  • The lookahead is negative, it fails the match if the pattern inside it matches the text
  • It matches: a d or D with [dD], then an optional literal dot with \.?, an optional space with ?, an r or R with [rR] and a trailing word boundary \b.

The main pattern is a more generic pattern - \b([a-zA-Z])\.? ?([a-zA-Z]):

  • \b - leading word boundary
  • (?![dD]\.? ?[rR]\b) - the negative lookahead
  • ([a-zA-Z]) - Group 1 capturing an ASCII letter
  • \.? - an optional dot
  • ? - an optional space
  • ([a-zA-Z]) - Group 2 capturing an ASCII letter
  • \b - a trailing word boundary

Upvotes: 2

Related Questions