Reputation: 1249
I need to distinguish variable names and non variable names in some expressions I am trying to parse. Variable names start with a colon, can have (but not begin with) numbers, and have underscores. So valid variable names are:
:x :_x :x2 :alpha_x // etc
Then I have to pick out other words in the expression that don't begin with colons. So in the following expression:
:result = median(:x,:y,:z)
The variables would be :result, :x, :y, and :z while the other non-variable word would be median.
My regex to pick out the variable names is (this works):
:[a-zA-Z_]{1}[a-zA-Z0-9_]*
But I cannot figure out how to get the non-variable words. My regex for that is:
(?<!:)([a-zA-Z_]{1}[a-zA-Z0-9_]*)
The issue is, the match is only excluding the first character after the : like so:
Upvotes: 5
Views: 447
Reputation: 627103
The (?<!:)([a-zA-Z_]{1}[a-zA-Z0-9_]*)
regex still matches partial variable words because (?<!:)
assures there is no :
immediately to the left of the current location, and then matches an identifier without checking for a word boundary. So, in :alpha
, lpha
is matched because l
is preceded with a char other than :
.
Hence the problem is easy to solve by adding a word boundary before [a-zA-Z_]
:
var words = Regex.Matches(s, @"(?<!:)\b[a-zA-Z_]\w*", RegexOptions.ECMAScript)
.Cast<Match>()
.Select(x => x.Value)
.ToList();
See the regex demo. Note you do not need to wrap the whole pattern with a capturing group.
Pattern details
(?<!:)
- make sure there is no :
immediately to the left of the current location\b
- a word boundary: make sure there are no letters, digits or _
immediately to the left of the current location[a-zA-Z_]
- match an ASCII letter or _
\w*
- 0+ ASCII letters, digits or _
(must be used with the ECMAScript
option to only match ASCII letters and digits and make word boundary handle ASCII only)Upvotes: 1
Reputation: 522084
The following pattern seems to work:
(?<=[^A-Za-z0-9_:])[a-zA-Z_]{1}[a-zA-Z0-9_]*
The lookbehind (?<=[^A-Za-z0-9_:])
asserts that what precedes is neither a character allowed in the variable name or a colon. This would then mark the start of a non variable word.
Upvotes: 1