user1817376
user1817376

Reputation: 107

RegEx to match phrase with uppercase words at beginning and ending

"UPPERCASEWORD UPPERCASEWORD lowercaseword UPPERCASEWORD UPPERCASEWORD"

The following RegEx matches the above pattern well but fails to match if there a digit in the middle. \b[A-Z][A-Z][A-Z]+(?:[\sa-z,]+[A-Z]+)*\b

"UPPERCASEWORD UPPERCASEWORD lowercaseword 1(or any digit) UPPERCASEWORD UPPERCASEWORD" will not match

Any idea how to include a numeral in the match? I tried some options with [0-9] in between but did not work.

Upvotes: 3

Views: 250

Answers (1)

Highly Irregular
Highly Irregular

Reputation: 40819

Actually the given pattern appears to match:

  1. 3 or more upper case characters, then
  2. 1 or more lower case characters (or commas or spaces), then
  3. 1 or more upper case characters

You might want this pattern:

\b[A-Z][A-Z][A-Z]+(?:[\sa-z,]+[0-9]*[A-Z]+)*\b

which is equivalent to:

\b[A-Z]{3,}(?:[\sa-z,]+[0-9]*[A-Z]+)*\b

Or, if you want to allow just 2 (or potentially more) upper case characters before the lowercase one, then this:

\b[A-Z]{2,}(?:[\sa-z,]+[0-9]*[A-Z]+)*\b

It would help if you posted some test data.

Update: It sounds like you want something quite different to what you originally described. Will this do, or does it match too much?

\b[A-Z]{2}.*[A-Z]{2}\b

If that matches too much, then if your tool supports negative lookahead then this might work, but it's getting pretty messy:

\b[A-Z]{2}((?!\b[a-z][a-z'0-9]+\b\s[a-z][a-z'0-9]+).)*[A-Z]{2}\b

Upvotes: 3

Related Questions