Reputation: 107
"UPPERCASEWORD UPPERCASEWORD lowercaseword UPPERCASEWORD UPPERCASEWORD"
The following RegEx matches the above pattern well but fails to match if there a digit in the middle. \b[A-Z][A-Z][A-Z]+(?:[\sa-z,]+[A-Z]+)*\b
"UPPERCASEWORD UPPERCASEWORD lowercaseword 1(or any digit) UPPERCASEWORD UPPERCASEWORD" will not match
Any idea how to include a numeral in the match? I tried some options with [0-9] in between but did not work.
Upvotes: 3
Views: 250
Reputation: 40819
Actually the given pattern appears to match:
You might want this pattern:
\b[A-Z][A-Z][A-Z]+(?:[\sa-z,]+[0-9]*[A-Z]+)*\b
which is equivalent to:
\b[A-Z]{3,}(?:[\sa-z,]+[0-9]*[A-Z]+)*\b
Or, if you want to allow just 2 (or potentially more) upper case characters before the lowercase one, then this:
\b[A-Z]{2,}(?:[\sa-z,]+[0-9]*[A-Z]+)*\b
It would help if you posted some test data.
Update: It sounds like you want something quite different to what you originally described. Will this do, or does it match too much?
\b[A-Z]{2}.*[A-Z]{2}\b
If that matches too much, then if your tool supports negative lookahead then this might work, but it's getting pretty messy:
\b[A-Z]{2}((?!\b[a-z][a-z'0-9]+\b\s[a-z][a-z'0-9]+).)*[A-Z]{2}\b
Upvotes: 3