Sbraaa
Sbraaa

Reputation: 55

regex: extract text between two string with text that match a specific word

I'm refactorying a very big C project and I need to find out some part of code written by specific programmer. Fortunately every guy involved in this project mark his own code using his email address in standard C style comments.

Ok, someone could say that this could be achieved easily with a grep from command line, but this is not my goal: I may need to remove this comments or substitute them with other text so regex is the only solution.

Ex.

/*********************************************
 *
 * ... some text ....
 *
 * author: [email protected]
 *
 *********************************************/

From this post I found the right expression to search for C style comments which is:

\/\*(\*(?!\/)|[^*])*\*\/

But that is not enough! I only need the comments which contains a specific email address. Fortunately the domain of email address I'm looking for seems to be unique in the whole project so this could make it simpler.

I think I must use some positive lookahead assertion, I've tried this one:

(\/\*)(\*(?!\/)|[^*](?=.*domain.com))*(\*\/)

but it doesn't run! Any advice?

Upvotes: 1

Views: 129

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627086

You can use

\/\*[^*]*(?:\*(?!\/)[^*]*)*@domain\.com[^*]*(?:\*(?!\/)[^*]*)*\*\/

See the regex demo

Pattern details:

  • /\* - comment start
  • [^*]*(?:\*(?!\/)[^*]*)* - everything but */
  • @domain\.com - literal domain.com
  • [^*]*(?:\*(?!\/)[^*]*)* - everything but */
  • \*\/ - comment end

A faster alternative (as the first part will be looking for everything but the comment end and the word @domain):

\/\*[^*@]*(?:\*(?!\/)[^*@]*|@(?!domain\.com)[^*@]*)*@domain\.com[^*]*(?:\*(?!\/)[^*]*)*\*\/

See another demo

In these patterns, I used an unrolled construct for (\*(?!\/)|[^*])*: [^*]*(?:\*(?!\/)[^*]*)*. Unrolling helps construct more efficient patterns.

Upvotes: 2

Related Questions