Reputation: 2811
I have a highlighting algorithm that takes a string and adds highlighting codes around matches in it. The problem I am having is with words like "Find tæst" as the string to be searched and "taest" as the string to find. Since the length of the search string doesn't match the length of the match, I can't accurately find the end of the match. IndexOf in my case is showing me the match but since the combined æ is counted as one character, it is throwing off my detection of the end of the match. I don't think IndexOf will work for me here. Something that returns the index of the match and the length of the match would work. But I don't know what else to use.
' cycle through search words and replace them in the text
For intWord = LBound(m_arrSearchWords) To UBound(m_arrSearchWords)
If m_arrSearchWords(intWord).Length > 0 Then
' replace instances of the word with the word surrounded by bold codes
' find starting position
intPos = strText.IndexOf(m_arrSearchWords(intWord), System.StringComparison.CurrentCultureIgnoreCase)
Do While intPos <> -1
strText = strText.Substring(0, (intPos - 1) - 0 + 1) & cstrHighlightCodeOn & strText.Substring(intPos, m_arrSearchWords(intWord).Length) & cstrHighlightCodeOff & strText.Substring(intPos + m_arrSearchWords(intWord).Length)
intPos = strText.IndexOf(m_arrSearchWords(intWord), intPos + m_arrSearchWords(intWord).Length + cstrHighlightCodeOn.Length + cstrHighlightCodeOff.Length, System.StringComparison.CurrentCultureIgnoreCase)
Loop
End If
Next intWord
The Substring method is failing as the length is beyond the end of the string. I put a fix in for strings that end with the search term (not shown above). But longer strings will be highlighted incorrectly and I need to fix those.
Upvotes: 0
Views: 128
Reputation: 46445
If I understand correctly, you are looking for a function that returns the "matched-string" - in other words, when you are looking for s1
inside s2
, then you want to know exactly what part of s2
was matched (index of first and last character matched). This allows you to highlight the match, and doesn't modify the string (upper/lower case, ligature, etc).
I don't have VB.net, and unfortunately VBA doesn't have exactly the same search functionality as VB.net - so please understand that the following code correctly identifies the beginning and end of a match, but it's only tested with upper/lower case matching. I hope this helps you solve the problem.
Option Compare Text
Option Explicit
Function startEndIndex(bigString, smallString)
' function that returns start, end index
' of the match
' it keeps shortening the bigString until no match is found
' this is how it takes care of mismatches in number of characters
' because of a match between "similar" strings
Dim i1, i2
Dim shorterString
i2 = 0
' first see if there is a match at all:
i1 = InStr(1, bigString, smallString, vbTextCompare)
If i1 > 0 Then
' largest value that i2 can have is end of string:
i2 = Len(bigString)
' can make it shorter - but no shorter than twice the length of the search string
If i2 > i1 + 2 * Len(smallString) Then i2 = i1 + 2 * Len(smallString)
shorterString = Mid(bigString, i1, i2 - i1)
' keep making the string shorter until there is no match:
While InStr(1, shorterString, smallString, vbTextCompare) > 0
i2 = i2 - 1
shorterString = Mid(bigString, i1, i2 - i1)
Wend
End If
' return the values as an array:
startEndIndex = Array(i1, endOfString)
End Function
Sub test()
' a simple test routine to see that things work:
Dim a
Dim longString: longString = "This is a very long TaesT of a complicated string"
a = startEndIndex(longString, "very long taest")
If a(0) = 0 And a(1) = 0 Then
MsgBox "no match found"
Else
Dim highlightString As String
highlightString = Left(longString, a(0) - 1) & "*" & Mid(longString, a(0), a(1) - a(0) + 1) & _
"*" & Mid(longString, a(1) + 1)
MsgBox "start at " & a(0) & " and end at " & a(1) & vbCrLf & _
"string matched is '" & Mid(longString, a(0), a(1) - a(0) + 1) & "'" & vbCrLf & _
"with highlighting: " & highlightString
End If
End Sub
Upvotes: -1
Reputation: 2811
While it would be nice of IndexOf to return the match length, it turns out you can just do the comparison yourself to figure it out. I just do a secondary comparison with a length to find the largest match. I start at the length of the searched for word, which should be the largest. And then work my way backwards to find the length. Once I've found the length I use that. If I don't find it, I work my way up in length. This works if the string I'm searching for is larger or if it is smaller. It means in the normal case at least one extra comparison and in the worst case an additional number based on the length of the search word. Maybe if I had the implementation for IndexOf, I could improve it. But at least this works.
' cycle through search words and replace them in the text
For intWord = LBound(m_arrSearchWords) To UBound(m_arrSearchWords)
If m_arrSearchWords(intWord).Length > 0 Then
' find starting position
intPos = strText.IndexOf(m_arrSearchWords(intWord), System.StringComparison.CurrentCultureIgnoreCase)
Do While intPos <> -1
intOrigLength = m_arrSearchWords(intWord).Length
' if there isn't enough of the text left to add the search word length to
If strText.Length < ((intPos + intOrigLength - 1) - 0 + 1) Then
' use shorter length
intOrigLength = ((strText.Length - 1) - intPos + 1)
End If
' find largest match
For intLength = intOrigLength To 1 Step -1
If m_arrSearchWords(intWord).Equals(strText.Substring(intPos, intLength), StringComparison.CurrentCultureIgnoreCase) Then
' if match found, highlight it
strText = strText.Substring(0, (intPos - 1) - 0 + 1) & cstrHighlightCodeOn & strText.Substring(intPos, intLength) & cstrHighlightCodeOff & strText.Substring(intPos + intLength)
' find next
intPos = strText.IndexOf(m_arrSearchWords(intWord), intPos + intLength + cstrHighlightCodeOn.Length + cstrHighlightCodeOff.Length, System.StringComparison.CurrentCultureIgnoreCase)
' exit search for largest match
Exit For
End If
Next
' if we didn't find it by searching smaller - search larger
If intLength = 0 Then
For intLength = intOrigLength + 1 To ((strText.Length - 1) - intPos + 1)
If m_arrSearchWords(intWord).Equals(strText.Substring(intPos, intLength), StringComparison.CurrentCultureIgnoreCase) Then
' if match found, highlight it
strText = strText.Substring(0, (intPos - 1) - 0 + 1) & cstrHighlightCodeOn & strText.Substring(intPos, intLength) & cstrHighlightCodeOff & strText.Substring(intPos + intLength)
' find next
intPos = strText.IndexOf(m_arrSearchWords(intWord), intPos + intLength + cstrHighlightCodeOn.Length + cstrHighlightCodeOff.Length, System.StringComparison.CurrentCultureIgnoreCase)
' exit search for largest match
Exit For
End If
Next
End If
Loop
End If
Next intWord
Upvotes: 0