DazEvans
DazEvans

Reputation: 149

Cleaning text strings in vb

I am trying to clean up a string from a text field that will be part of sql query.

I have created a function:

Private Function cleanStringToProperCase(dirtyString As String) As String
    Dim cleanedString As String = dirtyString
    'removes all but non alphanumeric characters except @, - and .'
    cleanedString = Regex.Replace(cleanedString, "[^\w\.@-]", "")
    'trims unecessary spaces off left and right'
    cleanedString = Trim(cleanedString)
    'replaces double spaces with single spaces'
    cleanedString = Regex.Replace(cleanedString, "  ", " ")
    'converts text to upper case for first letter in each word'
    cleanedString = StrConv(cleanedString, VbStrConv.ProperCase)

    'return the nicely cleaned string'
    Return cleanedString
End Function

But when I try to clean any text with two words, it strips ALL white space out. "daz's bike" becomes "Dazsbike". I am assuming I need to modify the following line:

   cleanedString = Regex.Replace(cleanedString, "[^\w\.@-]", "")

so that it also lets single white space characters remain. Advice on how to do so is greatly received as I cannot find it on any online tutorials or on the MSDN site (http://msdn.microsoft.com/en-us/library/844skk0h(v=vs.110).aspx)

Upvotes: 2

Views: 2137

Answers (2)

dbasnett
dbasnett

Reputation: 11773

Or, if you are not a big fan of regex...

Private Function cleanStringToProperCase(dirtyString As String) As String
    'specify valid characters
    Dim validChars As String = " @-.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
    'removes all but validChars'
    Dim cleanedString As String = dirtyString.Where(Function(c) validChars.Contains(c)).ToArray
    Dim myTI As Globalization.TextInfo = New Globalization.CultureInfo(Globalization.CultureInfo.CurrentCulture.Name).TextInfo

    'return the nicely cleaned string'
    Return myTI.ToTitleCase(cleanedString.Trim)
End Function

Upvotes: 1

Teejay
Teejay

Reputation: 7501

Use "[^\w\.,@\-\' ]" instead of your pattern string.

Also, I would use

Regex.Replace(cleanedString, " +", " ")

instead of

Regex.Replace(cleanedString, "  ", " ")

Upvotes: 3

Related Questions