Justin C
Justin C

Reputation: 640

How to match a sequence of characters that doesn't form a specific word?

This may not be possible with regular expressions, and if not, please explain why, or provide a link.

Ultimately, I'm trying to modify my php syntax file for vim to fix a glitch when /* */ comments contain doc tags (e.g. /** @param value description */). I prefer to keep the highlighting even though the doc tags are benign inside /* */.

Currently, one syn match for phpDocTags eats the end-of-comment token (*/).

# vim syntax line
syn match phpDocTags "@\(var\)\s\+\S\+.*" containedin=phpComment

# example php code
/* @var $something 12 34*/

# vim regex match
@var $something 12 34*/

I want to replace the .* in the phpDocTags regex with an expression that says, "Match any character that does not form */", or alternately, "Match any character that is not a * followed by /."

Currently I have a partial solution using negative lookahead, however, it suffers from the issue that it doesn't match the character before the */. This works well enough if a space always precedes the */.

# vim syntax line v2
syn match phpDocTags "@\(vars\)\s\+\S\+\(.\(\*/\)\@!\)*" containedin=phpComment

# same example php code

# vim regex match v2
@var $something 12 3

So with version 2, the new expression says, "Match any character that does not precede */."

Keep in mind that the */ may be on a different line, so it must be considered optional. Here's brief list of inputs to test against.

INPUT                                 EXPECTED MATCH
/* @var $something 1234*/             @var $something 1234
/* @var */                            @var      # but prefer no match if possible
/* @var $something 1234               @var $something 1234
/* @var $something 1 / 2 * 34*/       @var $something 1 / 2 * 34

P.S. If there's a flag I can add to phpComment that withholds the */ from contained expressions, please mention that in the comments. The primary focus of this question is regular expressions, not vim's syntax framework.

Upvotes: 0

Views: 137

Answers (2)

romainl
romainl

Reputation: 196556

@var\s*\S.*\ze\*/

does what you ask while still conforming to @Qeole's additional "if there is anything after the */" requirement.

\ze is a convenient way to mark the end of the actual match while still providing the regular expression engine with relevant rules.

@var\s*\S.*\ze*/

Upvotes: 2

Bohemian
Bohemian

Reputation: 425033

This regex matches your target:

(?<=\/\* )@var [^*].*?(?=\*\/|$)

See live demo tested against your sample input.

It also doesn't match the empty comment as you hoped.

It works using:

  • A look behind to set the start of capture
  • a reluctant quantifier, in case further comments follow on the same line (see example in demo link)
  • an alternate (either end of comment or end of line) look ahead to set the end of capture

Look aheads don't capture anything, so the entire match (not a group) is your target input.

Upvotes: -1

Related Questions