Bogdan
Bogdan

Reputation: 44566

Regex pattern nested inside other pattern

I have a string with some embeded variables in it, and I need to extract the names of those variables. I'm not versed in regular expressions and I'm having trouble getting this to work.

Here's and example of how the string looks:

Lorem ipsum dolor sit amet {% #varName1 %}, consectetur adipisicing #non_var elit, sed

{% #varName2|prop1 %} do eiusmod tempor incididunt ut labore et dolore magna aliqua

{% identifier #varName3|prop2 %}. Ut enim ad minim veniam.

Variable names are prefixed with # and are placed inside the these delimiters {% and %}. Using this expression I can match the variable names:

(?<=#)(.*?)(?=[\s\|])

However this matches also #non_var which is not inside the delimiters and is not a valid variable.

I've also tried this:

(?<={% )(#(.*?)[^\s\|])(?= %})

But that only matches #varName1 and #varName2|prop1 (for which I don't need the prop1 part). The expected result is to match:

varName1, varName2 and varName3.

Any suggestions would be greatly appreciated.

Upvotes: 1

Views: 1165

Answers (2)

Roney Michael
Roney Michael

Reputation: 3994

Since you have not mentioned which language/application is in use, I'll give a general use case; I myself have tried it in Notepad++ with success.

You could use the regex:

(\{%[^#]*#)([\w]*)(.*?%\})

Here varname may consist of any length of alphabets, digits and underscores. If you want to enforce the condition that the first character of the variable name may not be a digit, use:

(\{%[^#]*#)([a-z_][\w]*)(.*?%\})

This will recognize everything between and including {% and %}. Now you can use a back reference to the second matched sub-expresion ($2 in Notepad++; \2 in many programming languages) to get varnames.

For your input text,

Lorem ipsum dolor sit amet {% #varName1 %}, consectetur adipisicing #non_var elit, sed {% #varName2|prop1 %} do eiusmod tempor incididunt ut labore et dolore magna aliqua {% identifier #varName3|prop2 %}. Ut enim ad minim veniam.

my search and replace gave me:

Lorem ipsum dolor sit amet varName1, consectetur adipisicing #non_var elit, sed varName2 do eiusmod tempor incididunt ut labore et dolore magna aliqua varName3. Ut enim ad minim veniam.

Upvotes: 1

Dave Sexton
Dave Sexton

Reputation: 11188

Try this - I think this is right:

(?<=\{%.*#)[\w|]+(?=.*%\})

Upvotes: 0

Related Questions