Reputation: 3025
How do you determine if a letter is in the range from A-Z or Digit 0-9? We are getting some corrupted data "I_999Š=ÄÖÆaðøñòòñ".
I thought I could use Char.IsLetterOrDigit("Š") to ID the corrupted data from "I_999Š", but unexpectedly this is returning true. I need to trap this, any thoughts?
Upvotes: 3
Views: 25450
Reputation: 7921
I can't help but notice that everyone seems to be missing the real issue: your data "corruption" appears to be an obvious character encoding problem. Therefore, no matter what you do with the data, you will be (mis)treating the symptom and ignoring the root cause.
To be specific, you appear to be attempting to interpret the received binary BYTES as ASCII text, when those BYTES were almost-certainly intended to represent text encoded as something-other-than-ASCII.
You should find out what character encoding applies to the string of text that you received. Then you should read that data while applying the appropriate character encoding transformations.
You should read Joel Spolsky's article "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets". Especially the section with the heading "There Ain't No Such Thing As Plain Text", which emphasizes exactly that.
Upvotes: 13
Reputation: 446
The only way to ensure that you are dealing with printable ASCII characters, regardless of the encoding in the program or even in the string in question is to check each character for a legal value between 32 and 126 (127 = Delete -- not actually a 'printable' character).
i.e.
Public Module StringExtensions
<Extension()>
Public Function IsASCII(inString As String, Optional bPrintableOnly As Boolean = True) ' 127 = Delete (non-printing) < 32 = control characters also, non-printing
Dim lowerLimit As Int32 = If(bPrintableOnly, 32, 0)
Dim upperLimit As Int32 = If(bPrintableOnly, 127, 128)
For Each ch In inString.ToCharArray()
If Not Asc(ch) < upperLimit OrElse Asc(ch) < lowerLimit Then
Return False
End If
Next
Return True
End Function
End Module
Upvotes: 1
Reputation: 121
Use Asc(char) function. It returns a ANSI Character Code from 0 to 255. Check ANSI Character Codes Chart
Upvotes: 0
Reputation: 29527
For Each m As Match In Regex.Matches("I_999Š=ÄÖÆaðøñòòñ", "[^A-Z0-9]")
'' Found a bad character
Next
or
For Each c As Char In "I_999Š=ÄÖÆaðøñòòñ"
If Not (c >= "A"c AndAlso c <= "Z"c OrElse c >= "0"c AndAlso c <= "9"c) Then
'' Found a bad character
End If
Next
EDIT:
Is there something wrong with this answer that warrants the two anonymous downvotes? Speak up, and I'll fix it. I notice that I left out a "Then" (fixed now), but I intended this as pseudocode.
Upvotes: 1
Reputation: 17718
You could use a regular expression to filter out the bad characters ... (use Regex.IsMatch instead if you only need to detect it)
str = Regex.Replace(str, "[^A-Za-z0-9]","", RegexOptions.None);
Upvotes: 0
Reputation: 735
Should just be:
if (Regex.IsMatch(input, "[A-Za-z0-9]"))
{
// do you thang
}
Upvotes: 1
Reputation: 37740
Well there are two quick options. The first is to use a regular expression the second is to use the Asc() function to determine if the Ascii value is in the range of those allowable characters. I would personally use Asc() for this.
Upvotes: 6