Stephan Geue
Stephan Geue

Reputation: 97

Regex: How to search for identifiers \w that do not contain 2 consecutive underscores

I tried "[A-Z][A-Z0-9_]*(_[A-Z0-9]+)+" in order to find identifiers consisting of '_'-chained alnum components, starting with a letter and containing no lower cases like e.g. "ID_RED", "NO_ENTRY_PERMITTED", "THIS_IS4YOU_ALL". I do not want to catch "THINKING__NO" or "4YOU_AND_ME".

The mistake seems to be in the second part "(_[A-Z0-9]+)+"; it's at least not greedy as expected but yields _RED, _ENTRY, _IS4YOU

Upvotes: 1

Views: 27

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626893

The problem is with [A-Z0-9_]* part that can match 0 or more consecutive _ chars. Your pattern is not anchored anyhow, so it can find partial matches in longer words, too.

You may use

\b[A-Z][A-Z0-9]*(?:_[A-Z0-9]+)+\b

See the regex demo

Details:

  • \b - word boundary
  • [A-Z] - an uppercase letter
  • [A-Z0-9]* - 0+ uppercase letters or digits
  • (?:_[A-Z0-9]+)+ - 1 or more occurrences of _ and then 1+ uppercase letters or digits
  • \b - word boundary

Upvotes: 2

Related Questions