Reputation: 4288
For example, I have a small function that returns a string in between two other strings (think in between single quotes, double quotes or even a simple html tag).
Dim exp As String = String.Format("{0}(.*?){1}", beginMarker, endMarker)
Now, if I pass "<b>" in for the beginMarker and "</b>" in for the end marker and I don't specify RegEx.Ignore case it returns correctly for the matching lower case <b></b>. Once I specify IgnoreCase however, it never returns (assuming the same input). Here's an example function (remove RegexOptions.IgnoreCase and it works). Also, whether I escape the markers being inputed it doesn't seem to change the output, the only difference is the IgnoreCase:
My question is, what am I missing (I used a simple example because I'm not actually parsing HTML with attributes)?
Input: beginMarker = "<b>"
Input: endMarker = "</b>"
Input: searchText = "<b>this is a test</b>"
Input: beginMakers (doesn't matter, True or False)
Public Shared Function GetStringInBetween(beginMarker As String, endMarker As String, searchText As String, includeMarkers As Boolean) As List(Of String)
beginMarker = RegularExpressions.Regex.Escape(beginMarker)
endMarker = RegularExpressions.Regex.Escape(endMarker)
Dim exp As String = String.Format("{0}(.*?){1}", beginMarker, endMarker)
Dim regEx As New RegularExpressions.Regex(exp)
Dim returnList As New List(Of String)
For Each m As Match In regEx.Matches(searchText, 0, RegexOptions.IgnoreCase)
If includeMarkers = True Then
returnList.Add(m.Value)
Else
returnList.Add(m.Value.TrimStart(beginMarker.ToCharArray).TrimEnd(endMarker.ToCharArray))
End If
Next
Return returnList
End Function
Upvotes: 1
Views: 2132
Reputation: 25013
I wouldn't use a .NET class name for the name of a variable as things could get confusing.
This works, and I changed out the Trim functions so that case is ignored:
Imports System.Text.RegularExpressions
Module Module1
Public Function GetStringInBetween(beginMarker As String, endMarker As String, searchText As String, includeMarkers As Boolean) As List(Of String)
Dim exp As String = String.Format("{0}(.*?){1}", Regex.Escape(beginMarker), Regex.Escape(endMarker))
Dim returnList As New List(Of String)
For Each m As Match In Regex.Matches(searchText, exp, RegexOptions.IgnoreCase)
If includeMarkers Then
returnList.Add(m.Value)
Else
' return the portion of the matched string without the markers
returnList.Add(m.Value.Substring(beginMarker.Length, m.Value.Length - beginMarker.Length - endMarker.Length))
End If
Next
Return returnList
End Function
Sub Main()
' include a \ to confirm the regex escaping
' outputs: "hello, again"
Console.WriteLine(String.Join(", ", GetStringInBetween("<x>", "</\x>", "<X>hello</\x> world <x>again</\x>", False).ToArray))
Console.ReadLine()
End Sub
End Module
Edit: Oh yeah, use Option Strict On too. And there is no overload of RegEx.Matches that takes (String, Int32, String) as parameters.
Upvotes: 3