Wayne
Wayne

Reputation: 35

Regex if match then replace

I've looked around to try to find an answer to this question, but I can't find exactly what I'm looking for. It seems like there should be a way to decide if there is a match and replace, otherwise do something else without the need to repeat the matching.

I'm trying to decide if the test string contains an HTML document that ends in

</body></html>

and inject some text directly ahead of those tags. Of course there might be a combination of white-space/carriage-returns/line-feeds between the 2 tags, thus I'm trying this with Regex. However, the test string might be just plain text and if the Regex match fails, I will just append the text to the end of the string. And of course, I'm probably making this more difficult than it really is.

I don't really have any code to show here since I can't figure out if this is possible with .NET Regex implementation, but here is some psudo-code showing what I would like to do:

        Dim testString As String = some file contents
        Dim reg As New Regex("(<\/body>\s*<\/html>)", RegexOptions.IgnoreCase)
        Dim rMatch As Match = reg.Match(testString)
        If rMatch.Success Then
            rMatch.Replace(newString)
        Else
            testString &= alternateNewString
        End If

Of course I would need to put the end body and end html tags into the newString to properly close the document, but that should be no problem. The part I can't seem to implement is the match replacement without the need to run the regex again. It seems like calling match to determine if it is a match, then calling replace is making it run twice. And again, I might be over thinking this, or prematurely optimizing. What do you think?

Upvotes: 1

Views: 1731

Answers (1)

41686d6564
41686d6564

Reputation: 19641

If I understand you correctly, you're trying to do something like this (which isn't so good, see below):

Dim testString As String = "Your original string"
Dim newStr As String = String.Empty
Dim textToInsert As String = "Your text to 'inject'"

Dim reg As New Regex("<\/body>\s*<\/html>", RegexOptions.IgnoreCase)
newStr = reg.Replace(testString, textToInsert & Environment.NewLine & "</body></html>")
If newStr = testString Then
    newStr = testString & Environment.NewLine & textToInsert
End If

That will work, but compared to matching twice, it won't be any better in respect of performance.

So, a better alternative is to actually let the regex do all the work for you (i.e. matching/replacing either the closing tags OR the end of the string). In that case, you could change your pattern to look like this: \s*(<\/body>\s*<\/html>)|$.

Note:

  • |$ basically means "or the end of the string".
  • Your original pattern was put in a capturing group () so you can access it later when replacing.

Using this way, your code would look something like the following:

Dim testString As String = "Your original string"
Dim newStr As String = String.Empty
Dim textToInsert As String = "Your text to 'inject'"

Dim reg As New Regex("\s*(<\/body>\s*<\/html>)|$", RegexOptions.IgnoreCase)
newStr = reg.Replace(testString, Environment.NewLine & textToInsert &
                     Environment.NewLine & "$1", 1)

Where:

  • $1 represents the first group, which is basically </body> and </html> with any number whitespace characters in between.
  • The last argument in the Replace function is the maximum number of matched strings that should be replaced. It is set to 1 in order to prevent inserting the text before both the closing tags and the end of the string.

Hope that helps :)

Upvotes: 1

Related Questions