Coding Duchess
Coding Duchess

Reputation: 6909

Scan a file for a string of words ignoring extra whitespaces using VB.NET

I am searching a file for a string of words. For example "one two three". I have been using:

Dim text As String = File.ReadAllText(filepath)
For each phrase in phrases
    index = text.IndexOf(phrase, StringComparison.OrdinalIgnoreCase)
    If index >= 0 Then
        Exit For
    End If
Next

and it worked fine but now I have discovered that some files might contain target phrases with more than one whitespace gaps between words.

for example my code finds

"one two three" but fails to find "one two three"

is there a way I can use regular expressions, or any other technique, to capture the phrase even if distance between words is more than one whitespace?

I know I could use

Dim text As String = File.ReadAllText(filepath)
For each phrase in phrases
    text=text.Replace("  "," ")
    index = text.IndexOf(phrase, StringComparison.OrdinalIgnoreCase)
    If index >= 0 Then
        Exit For
    End If
Next

But I wanted to know if there is a more efficient way to accomplish that

Upvotes: 0

Views: 364

Answers (3)

Paul Ishak
Paul Ishak

Reputation: 1093

You can make a function to remove any double spaces.

Option Strict On
Option Explicit On
Option Infer Off
Public Class Form1
    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        Dim testString As String = "one two  three   four    five        six"
        Dim excessSpacesGone As String = RemoveExcessSpaces(testString)
        'one two three four five six
        Clipboard.SetText(excessSpacesGone)
        MsgBox(excessSpacesGone)
    End Sub
    Function RemoveExcessSpaces(source As String) As String
        Dim result As String = source
        Do
            result = result.Replace("  ", " "c)
        Loop Until result.IndexOf("  ") = -1
        Return result
    End Function
End Class

Upvotes: 1

user3972104
user3972104

Reputation:

Comments in the code will explain the code

        Dim inputStr As String = "This contains one        Two  three and some     other words" '<--- this be the input from the file
        inputStr = Regex.Replace(inputStr, "\s{2,}", " ") '<--- Replace extra white spaces if any
        Dim searchStr As String = "one two three" '<--- be the string to be searched
        searchStr = Regex.Replace(searchStr, "\s{2,}", " ") '<--- Replace extra white spaces if any
        If UCase(inputStr).Contains(UCase(searchStr)) Then '<--- check if input contains search string
            MsgBox("contains") '<-- display message if it contains
        End If

Upvotes: 1

Mark
Mark

Reputation: 8160

You could convert your phrases into regular expressions with \s+ between each word, and then check the text for matches against that. e.g.

Dim text = "This contains one    Two  three"
Dim phrases = {
    "one two three"
}
' Splits each phrase into words and create the regex from the words.
For each phrase in phrases.Select(Function(p) String.Join("\s+", p.Split({" "c}, StringSplitOptions.RemoveEmptyEntries)))
    If Regex.IsMatch(text, phrase, RegexOptions.IgnoreCase) Then
        Console.WriteLine("Found!")
        Exit For
    End If
Next

Note that this doesn't check for word boundaries at the beginning/end of the phrase, so "This contains someone two threesome" would also match. If you don't want that, add "\s" at both ends of the regex.

Upvotes: 0

Related Questions