Reputation: 97
I tried "[A-Z][A-Z0-9_]*(_[A-Z0-9]+)+"
in order to find identifiers consisting of '_'-chained alnum components, starting with a letter and containing no lower cases like e.g. "ID_RED", "NO_ENTRY_PERMITTED", "THIS_IS4YOU_ALL". I do not want to catch "THINKING__NO" or "4YOU_AND_ME".
The mistake seems to be in the second part "(_[A-Z0-9]+)+"
; it's at least not greedy as expected but yields _RED, _ENTRY, _IS4YOU
Upvotes: 1
Views: 27
Reputation: 626893
The problem is with [A-Z0-9_]*
part that can match 0 or more consecutive _
chars. Your pattern is not anchored anyhow, so it can find partial matches in longer words, too.
You may use
\b[A-Z][A-Z0-9]*(?:_[A-Z0-9]+)+\b
See the regex demo
Details:
\b
- word boundary[A-Z]
- an uppercase letter[A-Z0-9]*
- 0+ uppercase letters or digits(?:_[A-Z0-9]+)+
- 1 or more occurrences of _
and then 1+ uppercase letters or digits\b
- word boundaryUpvotes: 2