tutu056
tutu056

Reputation: 83

Regex matching javadoc comment for a specific method (python)

I'm having trouble coming up with a regex that will match the javadoc comment contents for a specific java method. Example:

/**
 * Do not match this.
 */

/**
 * Do match this.
 */
@SomeAnnotation
public boolean methodX() { }
/**
 * Do not match this.
 */

I already know the method signature so I can use that in the regex.

I can match all of the javadoc comments using:

/\*\*(.*?)\*/

I'm also specifying re.DOTALL. I tried expanding the regex to use a negative lookahead that says I only want a javadoc comment if it's the comment immediately proceeding the method:

/\*\*(.*?)\*/(?!.*?/\*\*.*?public boolean methodX\(\))

But that's causing the (.*?) to match the contents from the start of the first javadoc comment to the end of the javadoc comment immediately proceeding methodX.

I keep trying various ways of constructing positive and negative lookaheads but nothing is working. What am I missing?

Upvotes: 1

Views: 529

Answers (2)

wolffer-east
wolffer-east

Reputation: 1069

Your expression is greedy and is currently matching the */ in the first comment (because .* matches */). try using

/\*\*((?:[^*]+|\*[^/])*)\*/

This ensures that you will never match the ending */ by accident and end up with two comments matched at the same time

EDIT: This code avoids the issue of annotations that contain */. not sure why they would, but here goes:

/\*\*((?:(?!\*/).)*)\*/(?:(?!/\*\*).)*(?=public boolean methodX)

check out this example for confirmation that it works: http://regex101.com/r/yV9oK2/2 I switched from my original match to a negative lookahead to avoid a 'catastrophic backtrack' as the test program put it :)

Upvotes: 1

famousgarkin
famousgarkin

Reputation: 14126

This matches the comment (from /** to */) preceding the function in the given example text in a comment named group:

(?P<comment>/\*\*(?:(?!/\*\*).)*?\*/)(?:(?:(?!\*/).)*?)(?=public boolean methodX)

See a test at regex101.com.

  • The key here is to ignore the extra /** and */ in the wanted text using (?!/\*\*).)*? and (?!\*/).)*?

  • ?:s are to scrape the uninteresting groups from the result

Upvotes: 2

Related Questions