Michael
Michael

Reputation: 2414

RegEx match with whitespace but exclude in Output

I am trying to match multiple lines in source code between starting and ending markers (<% and %>), but I need the output variable (if that's what it is called) to exclude any surrounding newlines at the end of the matched code block. Because I am doing replace, I still need to be replacing all the way through the ending marker (%>), regardless of whether there are newlines before that marker.

Simplified example:

<%
    SomeCode1
    SomeCode2

    SomeCode3
    SomeCode4

%>

The goal is to output using $x the code lines between (and including) SomeCode1 and SomeCode4 while retaining the newlines between 2 and 3 but excluding any newline(s) after 4.

Thus far what I have come up with (simplified here) is to replace <%([ \t\r\n]*)([^%]*)%> with x$2y. The actual situation is more complex, but this serves to show the basic problem, which is that the result contains any extra newlines between SomeCode4 and %>. How do you replace the whole deal while excluding the trailing newlines?

Update 1: The real goal is to combine adjacent inline ASPX VB.NET code blocks in a clean way that tabs well. Example input:

<% SomeCode()
   SomeCode2()
%>
<%
   SomeCode3()
   SomeCode4()
%>

The result of the following replace seems to work pretty well for combining without extraneous newlines within, although it still may contain some extra newlines in the final form:

Replacing:

(\r\n)([\s]*)<%(?!=|-)[\s\r]*([^%]*?)[\s\r]*%>[\s\r]*<%(?!=|-)[\s\r]*([^%]*)

With:

$1$2<%$1$2    $3$1$2    $4$1$2

Output of above example with above replace (removes newline after SomeCode2() but still has newline at end of result):

<%
   SomeCode()
   SomeCode2()
   SomeCode3()
   SomeCode4()

%>

For those wondering, the reason a simple replace of %>[\s\r]*<% is not viable is because it would potentially include comments (<%--abc--%>) and the other inline code expressions (<%=abc%>), both of which need to be excluded from the replace operation.

Update 2 (seems good): With the help of Wiktor Stribiżew in the answers and comments, I was able to find something that is short and seems to work desirably in both Visual Studio 2017 and in the Online Demo:

Replacing:

(\r?\n)([ \t]*)<%(?!=|-)[\s]*([^%]*?)[\s]*%>[\s]*<%(?!=|-)[\s]*([^%]*?)[\s]*(%>)

With:

$1$2<%$1$2    $3$1$2    $4$1$2$5

Be sure to see Wiktor's demos in the comments for alternative syntax.

Upvotes: 1

Views: 186

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626896

You may use

<%([\s\r]*)([^%]*?)[\s\r]*%>

Details

  • <% - a literal substring
  • ([\s\r]*) - Group 1 (may be referred to with $1 from the replacement pattern): any 0+ whitespaces (in VS S&R, the \s does not match \r)
  • ([^%]*?) - Group 2: any 0+ chars other than %, as few as possible (as *? is a lazy quantifier and the subsequent patterns are tried first, and only if they fail to match, this pattern is "expanded")
  • [\s\r]* - 0+ whitespaces
  • %> - a literal substring.

Upvotes: 1

Related Questions