Lark
Lark

Reputation: 4694

RegEx that will capture everything between two characters including multiline blocks

I want to capture all text & blocks of text between <% and %>.

For example:

<html>
<head>
<title>Title Here</title>
</head>
<body>
<% include("/path/to/include") %>
<h1>Test Template</h1>
<p>Variable: <% print(second_var) %></p>
<%

variable = value;

foreach(params here)
{
    code here
}

%>
<p><a href="/" title="Home">Home</a></p>
</body>
</html>

I have tried \<\%(.*)\%\> but that will capture everything including <h1>Test Template</h1> block as well.

Upvotes: 22

Views: 98545

Answers (3)

Tim Pietzcker
Tim Pietzcker

Reputation: 336158

Which regex engine are you using?

<%(.*?)%>

should work with the "dot matches newline" option enabled. If you don't know how to set that, try

<%([\s\S]*?)%>

or

(?s)<%(.*?)%>

No need to escape <, %, or > by the way.

Upvotes: 58

Rafe Kettler
Rafe Kettler

Reputation: 76955

\<\%(.*?)\%\>. You need to use .*? to get non-greedy pattern matching.

EDIT To solve the multiline problem, you can't use the . wildcard, as it matches everything except newline. This option differs depending on your regular expressions engine. So, I can tell you what to do if you tell me your regex engine.

Upvotes: 9

Stijn Sanders
Stijn Sanders

Reputation: 36840

I've been using Microsoft's Regex engine (provided by JScript in IE) and it has a 'multi-line' switch that effects the behaviour of ., but then still I've had problems I had to resolve using [\u0000-\uFFFF] which matches everything including EOL's or any control chars...

So have a go with <%([\u0000-\uFFFF]*?)%>

Upvotes: 3

Related Questions