Reputation: 61
I need a regex that matches a specific capturing group which falls inside a multiline comment /* ... */.
In particular I need to find PHP variable definitions inside multiline comments
for example:
/* other code $var = value1 */
$var = value2 ;
/*
other code
$var = value3 ;
other code
*/
must match only the two occurences of '$var =' inside the comments but not the one outside the comment.
for the above example I wrote a regex that uses unrestricted lookbehind, like this
(?<=[/][\*][^/]+)(\$var) | (?<=[/][\*][^\*]+)(\$var)
but this regex fails in case it finds both charachter * and / even if they are APART from one another, between the comment opening tag '/*' and $var, which is not the desired bahaviour:
for example it fails in the case:
$var = .... ;
/*
other * code /
$var = .... ;
other code
*/
bacause it finds both '*' and '/' even if it's not the comment closing tag.
The key point is that I cannot negate a token which is combination of two charachter, but can only negate them one by one: [^*] or [^/].
...furthermore I cannot use the token [\s\S] instead of [^/] and [^*] because it would select $var out of comments preceded by a previous block of comment.
Any ideas? Is it even possibile with normal regex to achieve this? Or would I need something different?
Upvotes: 3
Views: 466
Reputation: 11
Try on php, but java works
(?s)(?i)(^|\s+?)(/*)((.)(?!*/))?(this)(.?)(*/)
in this example finding word is "this"
Upvotes: 0
Reputation: 18490
Idea by use of \G to glue matches to /*
(?:/\*|\G(?!^))(?:(?!\*/)[^$])*\K\$var\s*=\s*(?:(?!\*/)[^$;])*
Might be hard to understand if you aren't doing a lot with regexes. See regex101 for demo.
\G
can be seen as "glue", it is continuing at the end of a previous match. But \G
also matches the start of the string. That's why the negative lookahead is used \G(?!^)
only need to continue.
/\*|\G(?!^)
This part is to find the beginning of a match at /*
or continue matching.
(?:(?!\*/)[^$])*
Match any ammount of characters that are not $
(negated class) while not ending the comment (?!\*/)
for stuff before/between $var
\K\$var
\K
resets beginning of the reported match before $var
occurs. \K
can be useful as an alternative to a variable width lookebhind which is not available in pcre.
\s*=\s*(?:(?!\*/)[^$;])*
to match the value of the variable. This is far from perfect. Would need modification if quoted values or not convenient for your input. After =
it matches [^$;]
characters, that are not dollar or semicolon (?!\*/)
as long there's no */
ahead.
This regex does not check if there is actually a comment-end */
it just binds matches to /*
Another idea would be to use kind of this trick with verbs (*SKIP)(*FAIL)
like in this demo.
Upvotes: 1
Reputation: 75222
This matches just $var
, and only inside a multiline comment:
(?s)\$var(?=(?:(?!/\*|\*/).)*\*/)
(?:(?!/\*|\*/).)*
is a captive lookahead (also known as a Tempered Greedy Token--good name, but too many syllables), and it's how you exclude a sequence, as opposed to a single character. This one matches zero or more of any character (including newline, because of the (?s)
), as long as it's not the first character of /*
or */
.
The enclosing lookahead succeeds if it finds */
without first encountering /*
. That means the current position must be inside a comment (there's no need to match the opening /*
). And because the lookahead doesn't consume any characters, you can match more than one item per comment, if you need to.
One thing that can fool this regex is a */
that's not really comment closer. So these:
$var = "*/";
$var = ...;
// */
... would match, even though they're not in a comment.
Upvotes: 2
Reputation: 91385
How about:
$str = '
/* other code */
$var = "var1";
/*
other code
$var = "var2";
other code
*/
/* other code */
$var = "var3";
/*
other code / <-- a slash here
$var = "var4";
other code
*/';
preg_match_all('~/\*(?:(?!\*/).)+?(\$var = .+?;).*?\*/~s', $str, $m);
print_r($m[1]);
Output:
Array
(
[0] => $var = "var2";
[1] => $var = "var4";
)
Upvotes: 1
Reputation: 47099
Something like this might work:
/\/\*.*?\$var\s*\=\s(.*?)(?=\s*;)/s
Usage:
$str = '$var = .... ;
/*
other code
$var = ..... ;
other code
*/';
preg_match('/\/\*.*?\$var\s*\=\s(.*?)(?=\s*;)/s', $str, $matches);
var_dump($matches);
Will output:
array(2) {
[0]=>
string(26) "/*
other code
$var = ....."
[1]=>
string(5) "....."
}
And your string is stored in $matches[1]
Upvotes: 0