Marcus Widerberg
Marcus Widerberg

Reputation: 155

Function to trim leading and trailing whitespace in vba

I have checked quite a few suggestions re trimming leading & trailing whitespace in vba (excel, incidentally).

I have found this solution, but it also trims å ä ö (also caps) and I am too weak in regex to see why:

Function MultilineTrim (Byval TextData)
    Dim textRegExp
    Set textRegExp = new regexp
    textRegExp.Pattern = "\s{0,}(\S{1}[\s,\S]*\S{1})\s{0,}"
    textRegExp.Global = False
    textRegExp.IgnoreCase = True
    textRegExp.Multiline = True

    If textRegExp.Test (TextData) Then
      MultilineTrim = textRegExp.Replace (TextData, "$1")
    Else
      MultilineTrim = ""
    End If
End Function

(this is from an answer here at SO, where the useraccount seems inactive:

https://stackoverflow.com/a/1606433/3701019 )

So, I would love if anyone could help with either (a) an alternative solution to the problem or (b) a version of the regexp / code that would not strip out (single) åäö characters.

Thanks for any help!

Details: Problem

My context is a xmlparser in vba, where it gets chunks of xml to parse. It sometimes just gets a character from the stream, which may be å ä ö, which then this function strips away completely.

I would be happy to clarify or edit this question, of course.

FYI: I have shared exactly what I did based on the answers, see below.

Upvotes: 6

Views: 11195

Answers (6)

Sebastian Viereck
Sebastian Viereck

Reputation: 5885

Refactored and improved Richard Vivians version

Function cleanMyString(sInput)
    ' Remove leading and trailing spaces
    sInput = Trim(sInput)
    'Remove other characters that you dont want
    sInput = Replace(sInput, Chr(10), "")
    sInput = Replace(sInput, Chr(13), "")
    sInput = Replace(sInput, Chr(9), "")
    cleanMyString = sInput
End Function

Upvotes: 1

Marcus Widerberg
Marcus Widerberg

Reputation: 155

After consulting with stackexchange people on how to do this, I am adding the edit of the question as my own answer, instead. Here it is:

Answer / Used code

Thanks to the answer(s), this is what I will be using:

Function MultilineTrim(ByVal TextData)
    MultilineTrim = textRegExp.Replace(TextData, "")

'    If textRegExp.Test(TextData) Then
'        MultilineTrim = textRegExp.Replace(TextData, "$1")
'    Else
'        MultilineTrim = "" ' ??
'    End If
End Function

Private Sub InitRegExp()
    Set textRegExp = New RegExp
    'textRegExp.Pattern = "\s{0,}(\S{1}[\s,\S]*\S{1})\s{0,}" 'this removes å ä ö - bug!
    'textRegExp.Global = False

    'textRegExp.Pattern = "(^[ \t]+|[ \t]+$)" ' leaves a line break at start
    textRegExp.Pattern = "^[\s\xA0]+|[\s\xA0]+$" ' works! Ron Rosenfelds submit

    textRegExp.Global = True

    textRegExp.IgnoreCase = True
    textRegExp.MultiLine = True
End Sub

Thanks again all! (nod to Ron Rosenfeld)

Upvotes: 1

Mark Karam
Mark Karam

Reputation: 1

I would call Trim after replacing all the other characters. This way if there are spaces after the other characters they will also be removed.

Upvotes: -2

Richard Vivian
Richard Vivian

Reputation: 1750

You can create a custom function that strips out the characters that you don't want specifically.

Private Function CleanMyString(sInput As String) As String
   Dim sResult As String

   ' Remove leading ans trailing spaces
   sResult = Trim(sInput)
   'Remove other characters that you dont want
   sResult = Replace(sResult, chr(10), "")
   sResult = Replace(sResult, chr(13), "")
   sResult = Replace(sResult, chr(9), "")

End Function

This does not use regex though. Not sure if thats OK for your requirements?

Upvotes: 1

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60224

For a regex I would use:

^[\s\xA0]+|[\s\xA0]+$

This will match the "usual" whitespace characters as well as the NBSP, commonly found in HTML documents.

VBA Code would look something like below, where S is the line to Trim:

Dim RE as Object, ResultString as String
Set RE = CreateObject("vbscript.regexp")
RE.MultiLine = True
RE.Global = True
RE.Pattern = "^[\s\xA0]+|[\s\xA0]+$"
ResultString = RE.Replace(S, "")

And an explanation of the regex:

Trim whitespace at the start and the end of each line
-----------------------------------------------------

^[\s\xA0]+|[\s\xA0]+$

Options:  ^$ match at line breaks

Match this alternative (attempting the next alternative only if this one fails) «^[\s\xA0]+»
   Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
   Match a single character present in the list below «[\s\xA0]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      A “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s»
      The character with position 0xA0 (160 decimal) in the character set «\xA0»
Or match this alternative (the entire match attempt fails if this one fails to match) «[\s\xA0]+$»
   Match a single character present in the list below «[\s\xA0]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      A “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s»
      The character with position 0xA0 (160 decimal) in the character set «\xA0»
   Assert position at the end of a line (at the end of the string or before a line break character) «$»

Created with RegexBuddy

Upvotes: 7

AnotherParker
AnotherParker

Reputation: 790

Try this:

Function MultilineTrim (Byval TextData)
    Dim textRegExp
    Set textRegExp = new regexp
    textRegExp.Pattern = "(^[ \t]+|[ \t]+$)"
    textRegExp.Global = True
    textRegExp.IgnoreCase = True
    textRegExp.Multiline = True

    MultilineTrim = textRegExp.Replace (TextData, "")
End Function

Upvotes: 1

Related Questions