mgiuffrida
mgiuffrida

Reputation: 3569

Vim regex with back-reference to look-behind fails if a char follows the back-reference

I'm learning Vim-flavored regex and want to understand why this doesn't work.

Say I want to capture everything after a tag up to and including the closing tag:

<div>Test div</div>More words
     ^^^^^^^^^^^^^^

This works works but leaves off the trailing >:

/\v%(\<(\w+)\>)@<=.*\<\/\1

So I'd expect this to work, but it captures nothing:

/\v%(\<(\w+)\>)@<=.*\<\/\1\>

I know there are other ways to capture this, but I just want to learn why I can't include a character after the \1 back-reference.

For convenience and my understanding, here's my understanding of the regex:

/\v  %(           # non-capturing
         \<       # <
         (        # captures group 1
            \w+   # 1+ alpha-numeric chars
         )
         \>       # >
      )@<=        # the match should be preceded by all of the above
     .*           # anything
     \<\/         # </
     \1           # that which was captured as group 1
     \>           # >

Upvotes: 4

Views: 112

Answers (1)

Ingo Karkat
Ingo Karkat

Reputation: 172520

Yes, this looks like a bug in the new NFA-based regular expression engine. When switching to the old engine, you have to swap the capturing group and use (all explained under :help /\@<=), but then, the matching works:

\%#=1\v%(\1)@<=.*\<\/(\w+)\>

Also because of this, the :help concurs with @PeterRincker that it's better to use \zs instead:

\v%(\<(\w+)\>)\zs.*\<\/\1\>

Please report this bug, see :help bugs. Basically, you can send the information via email, to the vim_dev mailing list, or there's also a bug tracker.

Upvotes: 3

Related Questions