Reputation: 33120
In ancient time, we can specify all characters with chr(56)
For example, say the character is unprintable. We want to put it in a string. Just do
Dim a as string = chr (56)
Now we have UTF8 or unicode (or whatever encoding).
Say I want variable a to contain
    en space
    em space
    thin space
‌ ‌ zero width non-joiner
‍ ‍ zero width joiner
‎ ‎ left-to-right mark
‏ ‏ right-to-left mark
In fact, say I want to create a function that'll get rid all of such characters from my string.
How would I do so?
I want the function to leave chinese, korean, japanese characters intact and then get rid really really vague ones.
Upvotes: 1
Views: 6063
Reputation: 5403
''' <summary>
''' This function replaces 'smart quotes' (ASC 145, 146, 147, 148, 150) with their correct ASCII versions (ASC 39, 34, 45), and replaces any other non-ASCII characters with "?"
''' </summary>
''' <param name="expression"></param>
''' <returns></returns>
''' <remarks></remarks>
Public Function Unicode2ASCII(ByVal expression As String) As String
Dim sb As New System.Text.StringBuilder
For i As Integer = 1 To Len(expression)
Dim s As String = Mid(expression, i, 1)
Select Case Asc(s)
Case 145, 146 'apostrophes'
sb.Append("'"c)
Case 147, 148 'inverted commas'
sb.Append(""""c)
Case 150 'hyphen'
sb.Append("-"c)
Case Is > 127
sb.Append("?"c)
Case Else
sb.Append(s)
End Select
Next i
Return sb.ToString
End Function
Or to add them...
Dim s As String = "a" & ChrW(8194) & "b"
MsgBox(s)
Upvotes: 1
Reputation: 100545
Replace removes whatever you want. ChrW produces Unicode characters by code (to produce characters outside Unicode Plane 0 you need to concatenate 2 Char).
Something like:
Replace("My text", ChrW(8194), "");
Upvotes: 1
Reputation: 43743
It seems like there ought to be a better way, but the best I can come up with that would work in all situations would be something like this:
Private Function getString(ByVal xmlCharacterCode As String) As String
Dim doc As XmlDocument = New XmlDocument()
doc.LoadXml("<?xml version=""1.0"" encoding=""utf-8""?><test>" + xmlCharacterCode + "</test>")
Return doc.InnerText
End Function
And then use it like this:
myString = myString.Replace(getString(" "), "")
Also, you may want to take a look at this page I found:
Easy way to convert &#XXXX; from HTML to UTF-8 xml either programmaticaly in .Net or using tools
Upvotes: 0