Negative Zero
Negative Zero

Reputation: 1224

Vim Syntax Highlighting (Regular Expression)

I am trying to highlight a member variable in Vim using c.vim plugin.

For example, in

struct sockaddr_in sa;
sa.sin_family = AF_INET;

I want to highlight sin_family.

So, here is my syntax match code:

syn match   cCustomMember "\(\.\)\@<=[a-zA-Z0-9_]\+\s*\((\)\@!"
hi def link cCustomMember Number

Basically what I am trying to say here is that there must be a . in front, followed by multiple words characters, optionally followed by whitespace, and make sure no bracket is following.

But the above syntax highlighting regular expression doesn't seem to work correctly in Vim. For example, if i have code like this:

getWrapper()->error( NO_VALID_ID, CONNECT_FAIL.code(), CONNECT_FAIL.msg());

.msg and .code are highlighted except the last letter is not. But I don't want to highlight the member function (ends with a round bracket)

I think it's kinda similar to this Regex problem in python:

a = re.compile("(?<=\.)(?:\w+)(?!\()")
print a.search(".test(").group() #produces tes, which it's desired to match nothing
print a.search(".test").group()  # produces test

how to negative lookahead for a whole group rather than individual letters.

Upvotes: 1

Views: 3501

Answers (1)

kopischke
kopischke

Reputation: 3413

Explanation

The issue you are struggling with is due to the fundamental way how modern regex engines operate when looking for matches, called backtracking. Jan Goyvaerts put it succinctly in his post “Unintended Backtracking Can Bite You”:

Back­track­ing oc­curs when the reg­u­lar ex­pres­sion en­gine en­coun­ters a regex to­ken that does not match the next char­ac­ter in the string. The regex en­gine will then back up part of what it matched so far, to try dif­fer­ent al­ter­na­tives and/or rep­e­ti­tions. Un­der­stand­ing this pro­cess will make all the dif­fer­ence be­tween guess­ing and un­der­stand­ing why a reg­u­lar ex­pres­sion match­es what it does and doesn’t.

In your case, the regex engine will backtrack when the lookahead assertion is matching, testing for shorter combinations which match – and both .cod and .ms do. The following shows what happens, with the vertical bar delimiting the characters already consumed by the regex from the rest of the string .code():

.|code()   # good start => try next char
.c|ode()   # matches => try next char
.co|de()   # matches => try next char
.cod|e()   # matches => try next char
.code|()   # whoops, next char is "(" => track back
.cod|e()   # matches => we’re done here

Note this is only true when you use greedy quantifiers, as you do in your code; a lazy quantifier would match .c. See the Regex Tutorial on lazy versus greedy quantifiers.

Solution

The obvious way to step around this problem would be to prohibit backtracking before the lookahead, effectively “locking” the part of the pattern the regex has already consumed up to there: member functions would never match. Some regular expression engines will allow you to do exactly this using atomic grouping or even a possessive quantifier (which is essentially syntactic sugar for atomic grouping) – the better known are those listed on the pages linked before. Vim’s regex engine, however, is not one of them.

A somewhat less straightforward and rather more brittle way is to redefine what you are looking for: instead of a negative lookahead assertion matching an opening paren, use a positive lookahead assertion matching all valid characters separating member variables from other code (whitespace, comma, semicolon, closing paren, end of line – check your source for more) – essentially anything but an opening paren and another name character. I’ll leave it to you translate this into Vim’s regex syntax.

Upvotes: 4

Related Questions