Reputation: 155
I have checked quite a few suggestions re trimming leading & trailing whitespace in vba (excel, incidentally).
I have found this solution, but it also trims å ä ö (also caps) and I am too weak in regex to see why:
Function MultilineTrim (Byval TextData)
Dim textRegExp
Set textRegExp = new regexp
textRegExp.Pattern = "\s{0,}(\S{1}[\s,\S]*\S{1})\s{0,}"
textRegExp.Global = False
textRegExp.IgnoreCase = True
textRegExp.Multiline = True
If textRegExp.Test (TextData) Then
MultilineTrim = textRegExp.Replace (TextData, "$1")
Else
MultilineTrim = ""
End If
End Function
(this is from an answer here at SO, where the useraccount seems inactive:
https://stackoverflow.com/a/1606433/3701019 )
So, I would love if anyone could help with either (a) an alternative solution to the problem or (b) a version of the regexp / code that would not strip out (single) åäö characters.
Thanks for any help!
Details: Problem
My context is a xmlparser in vba, where it gets chunks of xml to parse. It sometimes just gets a character from the stream, which may be å ä ö, which then this function strips away completely.
I would be happy to clarify or edit this question, of course.
FYI: I have shared exactly what I did based on the answers, see below.
Upvotes: 6
Views: 11195
Reputation: 5885
Refactored and improved Richard Vivians version
Function cleanMyString(sInput)
' Remove leading and trailing spaces
sInput = Trim(sInput)
'Remove other characters that you dont want
sInput = Replace(sInput, Chr(10), "")
sInput = Replace(sInput, Chr(13), "")
sInput = Replace(sInput, Chr(9), "")
cleanMyString = sInput
End Function
Upvotes: 1
Reputation: 155
After consulting with stackexchange people on how to do this, I am adding the edit of the question as my own answer, instead. Here it is:
Thanks to the answer(s), this is what I will be using:
Function MultilineTrim(ByVal TextData)
MultilineTrim = textRegExp.Replace(TextData, "")
' If textRegExp.Test(TextData) Then
' MultilineTrim = textRegExp.Replace(TextData, "$1")
' Else
' MultilineTrim = "" ' ??
' End If
End Function
Private Sub InitRegExp()
Set textRegExp = New RegExp
'textRegExp.Pattern = "\s{0,}(\S{1}[\s,\S]*\S{1})\s{0,}" 'this removes å ä ö - bug!
'textRegExp.Global = False
'textRegExp.Pattern = "(^[ \t]+|[ \t]+$)" ' leaves a line break at start
textRegExp.Pattern = "^[\s\xA0]+|[\s\xA0]+$" ' works! Ron Rosenfelds submit
textRegExp.Global = True
textRegExp.IgnoreCase = True
textRegExp.MultiLine = True
End Sub
Thanks again all! (nod to Ron Rosenfeld)
Upvotes: 1
Reputation: 1
I would call Trim after replacing all the other characters. This way if there are spaces after the other characters they will also be removed.
Upvotes: -2
Reputation: 1750
You can create a custom function that strips out the characters that you don't want specifically.
Private Function CleanMyString(sInput As String) As String
Dim sResult As String
' Remove leading ans trailing spaces
sResult = Trim(sInput)
'Remove other characters that you dont want
sResult = Replace(sResult, chr(10), "")
sResult = Replace(sResult, chr(13), "")
sResult = Replace(sResult, chr(9), "")
End Function
This does not use regex though. Not sure if thats OK for your requirements?
Upvotes: 1
Reputation: 60224
For a regex I would use:
^[\s\xA0]+|[\s\xA0]+$
This will match the "usual" whitespace characters as well as the NBSP, commonly found in HTML documents.
VBA Code would look something like below, where S is the line to Trim:
Dim RE as Object, ResultString as String
Set RE = CreateObject("vbscript.regexp")
RE.MultiLine = True
RE.Global = True
RE.Pattern = "^[\s\xA0]+|[\s\xA0]+$"
ResultString = RE.Replace(S, "")
And an explanation of the regex:
Trim whitespace at the start and the end of each line
-----------------------------------------------------
^[\s\xA0]+|[\s\xA0]+$
Options: ^$ match at line breaks
Match this alternative (attempting the next alternative only if this one fails) «^[\s\xA0]+»
Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match a single character present in the list below «[\s\xA0]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s»
The character with position 0xA0 (160 decimal) in the character set «\xA0»
Or match this alternative (the entire match attempt fails if this one fails to match) «[\s\xA0]+$»
Match a single character present in the list below «[\s\xA0]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s»
The character with position 0xA0 (160 decimal) in the character set «\xA0»
Assert position at the end of a line (at the end of the string or before a line break character) «$»
Created with RegexBuddy
Upvotes: 7
Reputation: 790
Try this:
Function MultilineTrim (Byval TextData)
Dim textRegExp
Set textRegExp = new regexp
textRegExp.Pattern = "(^[ \t]+|[ \t]+$)"
textRegExp.Global = True
textRegExp.IgnoreCase = True
textRegExp.Multiline = True
MultilineTrim = textRegExp.Replace (TextData, "")
End Function
Upvotes: 1