Reputation: 35
I've looked around to try to find an answer to this question, but I can't find exactly what I'm looking for. It seems like there should be a way to decide if there is a match and replace, otherwise do something else without the need to repeat the matching.
I'm trying to decide if the test string contains an HTML document that ends in
</body></html>
and inject some text directly ahead of those tags. Of course there might be a combination of white-space/carriage-returns/line-feeds between the 2 tags, thus I'm trying this with Regex. However, the test string might be just plain text and if the Regex match fails, I will just append the text to the end of the string. And of course, I'm probably making this more difficult than it really is.
I don't really have any code to show here since I can't figure out if this is possible with .NET Regex implementation, but here is some psudo-code showing what I would like to do:
Dim testString As String = some file contents
Dim reg As New Regex("(<\/body>\s*<\/html>)", RegexOptions.IgnoreCase)
Dim rMatch As Match = reg.Match(testString)
If rMatch.Success Then
rMatch.Replace(newString)
Else
testString &= alternateNewString
End If
Of course I would need to put the end body and end html tags into the newString to properly close the document, but that should be no problem. The part I can't seem to implement is the match replacement without the need to run the regex again. It seems like calling match to determine if it is a match, then calling replace is making it run twice. And again, I might be over thinking this, or prematurely optimizing. What do you think?
Upvotes: 1
Views: 1731
Reputation: 19641
If I understand you correctly, you're trying to do something like this (which isn't so good, see below):
Dim testString As String = "Your original string"
Dim newStr As String = String.Empty
Dim textToInsert As String = "Your text to 'inject'"
Dim reg As New Regex("<\/body>\s*<\/html>", RegexOptions.IgnoreCase)
newStr = reg.Replace(testString, textToInsert & Environment.NewLine & "</body></html>")
If newStr = testString Then
newStr = testString & Environment.NewLine & textToInsert
End If
That will work, but compared to matching twice, it won't be any better in respect of performance.
So, a better alternative is to actually let the regex do all the work for you (i.e. matching/replacing either the closing tags OR the end of the string). In that case, you could change your pattern to look like this: \s*(<\/body>\s*<\/html>)|$
.
Note:
|$
basically means "or the end of the string".()
so you can access it later when replacing.Using this way, your code would look something like the following:
Dim testString As String = "Your original string"
Dim newStr As String = String.Empty
Dim textToInsert As String = "Your text to 'inject'"
Dim reg As New Regex("\s*(<\/body>\s*<\/html>)|$", RegexOptions.IgnoreCase)
newStr = reg.Replace(testString, Environment.NewLine & textToInsert &
Environment.NewLine & "$1", 1)
Where:
$1
represents the first group, which is basically </body>
and </html>
with any number whitespace characters in between.Replace
function is the maximum number of matched strings that should be replaced. It is set to 1
in order to prevent inserting the text before both the closing tags and the end of the string.Hope that helps :)
Upvotes: 1