Reputation: 6909
I am searching a file for a string of words. For example "one two three". I have been using:
Dim text As String = File.ReadAllText(filepath)
For each phrase in phrases
index = text.IndexOf(phrase, StringComparison.OrdinalIgnoreCase)
If index >= 0 Then
Exit For
End If
Next
and it worked fine but now I have discovered that some files might contain target phrases with more than one whitespace gaps between words.
for example my code finds
"one two three
" but fails to find "one two three
"
is there a way I can use regular expressions, or any other technique, to capture the phrase even if distance between words is more than one whitespace?
I know I could use
Dim text As String = File.ReadAllText(filepath)
For each phrase in phrases
text=text.Replace(" "," ")
index = text.IndexOf(phrase, StringComparison.OrdinalIgnoreCase)
If index >= 0 Then
Exit For
End If
Next
But I wanted to know if there is a more efficient way to accomplish that
Upvotes: 0
Views: 364
Reputation: 1093
You can make a function to remove any double spaces.
Option Strict On
Option Explicit On
Option Infer Off
Public Class Form1
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim testString As String = "one two three four five six"
Dim excessSpacesGone As String = RemoveExcessSpaces(testString)
'one two three four five six
Clipboard.SetText(excessSpacesGone)
MsgBox(excessSpacesGone)
End Sub
Function RemoveExcessSpaces(source As String) As String
Dim result As String = source
Do
result = result.Replace(" ", " "c)
Loop Until result.IndexOf(" ") = -1
Return result
End Function
End Class
Upvotes: 1
Reputation:
Comments in the code will explain the code
Dim inputStr As String = "This contains one Two three and some other words" '<--- this be the input from the file
inputStr = Regex.Replace(inputStr, "\s{2,}", " ") '<--- Replace extra white spaces if any
Dim searchStr As String = "one two three" '<--- be the string to be searched
searchStr = Regex.Replace(searchStr, "\s{2,}", " ") '<--- Replace extra white spaces if any
If UCase(inputStr).Contains(UCase(searchStr)) Then '<--- check if input contains search string
MsgBox("contains") '<-- display message if it contains
End If
Upvotes: 1
Reputation: 8160
You could convert your phrases into regular expressions with \s+
between each word, and then check the text for matches against that. e.g.
Dim text = "This contains one Two three"
Dim phrases = {
"one two three"
}
' Splits each phrase into words and create the regex from the words.
For each phrase in phrases.Select(Function(p) String.Join("\s+", p.Split({" "c}, StringSplitOptions.RemoveEmptyEntries)))
If Regex.IsMatch(text, phrase, RegexOptions.IgnoreCase) Then
Console.WriteLine("Found!")
Exit For
End If
Next
Note that this doesn't check for word boundaries at the beginning/end of the phrase, so "This contains someone two threesome"
would also match. If you don't want that, add "\s"
at both ends of the regex.
Upvotes: 0