Vim Syntax Highlighting (Regular Expression)

Question

I am trying to highlight a member variable in Vim using c.vim plugin.

For example, in

struct sockaddr_in sa;
sa.sin_family = AF_INET;

I want to highlight sin_family.

So, here is my syntax match code:

syn match   cCustomMember "$\.$\@<=[a-zA-Z0-9_]\+\s*$($\@!"
hi def link cCustomMember Number

Basically what I am trying to say here is that there must be a . in front, followed by multiple words characters, optionally followed by whitespace, and make sure no bracket is following.

But the above syntax highlighting regular expression doesn't seem to work correctly in Vim. For example, if i have code like this:

getWrapper()->error( NO_VALID_ID, CONNECT_FAIL.code(), CONNECT_FAIL.msg());

.msg and .code are highlighted except the last letter is not. But I don't want to highlight the member function (ends with a round bracket)

I think it's kinda similar to this Regex problem in python:

a = re.compile("(?<=\.)(?:\w+)(?!\()")
print a.search(".test(").group() #produces tes, which it's desired to match nothing
print a.search(".test").group()  # produces test

how to negative lookahead for a whole group rather than individual letters.

kopischke · Accepted Answer

Explanation

The issue you are struggling with is due to the fundamental way how modern regex engines operate when looking for matches, called backtracking. Jan Goyvaerts put it succinctly in his post “Unintended Backtracking Can Bite You”:

Backtracking occurs when the regular expression engine encounters a regex token that does not match the next character in the string. The regex engine will then back up part of what it matched so far, to try different alternatives and/or repetitions. Understanding this process will make all the difference between guessing and understanding why a regular expression matches what it does and doesn’t.

In your case, the regex engine will backtrack when the lookahead assertion is matching, testing for shorter combinations which match – and both .cod and .ms do. The following shows what happens, with the vertical bar delimiting the characters already consumed by the regex from the rest of the string .code():

.|code()   # good start => try next char
.c|ode()   # matches => try next char
.co|de()   # matches => try next char
.cod|e()   # matches => try next char
.code|()   # whoops, next char is "(" => track back
.cod|e()   # matches => we’re done here

Note this is only true when you use greedy quantifiers, as you do in your code; a lazy quantifier would match .c. See the Regex Tutorial on lazy versus greedy quantifiers.

Solution

The obvious way to step around this problem would be to prohibit backtracking before the lookahead, effectively “locking” the part of the pattern the regex has already consumed up to there: member functions would never match. Some regular expression engines will allow you to do exactly this using atomic grouping or even a possessive quantifier (which is essentially syntactic sugar for atomic grouping) – the better known are those listed on the pages linked before. Vim’s regex engine, however, is not one of them.

A somewhat less straightforward and rather more brittle way is to redefine what you are looking for: instead of a negative lookahead assertion matching an opening paren, use a positive lookahead assertion matching all valid characters separating member variables from other code (whitespace, comma, semicolon, closing paren, end of line – check your source for more) – essentially anything but an opening paren and another name character. I’ll leave it to you translate this into Vim’s regex syntax.

Vim Syntax Highlighting (Regular Expression)

Answers (1)

Explanation

Solution

Related Questions