Reputation: 423
UPDATED BELOW
I am reading a Binary file using BinaryReader in VB.NET. The structure of each row in the file is:
"Category" = 1 byte
"Code" = 1 byte
"Text" = 60 Bytes
Dim Category As Byte
Dim Code As Byte
Dim byText() As Byte
Dim chText() As Char
Dim br As New BinaryReader(fs)
Category = br.ReadByte()
Code = br.ReadByte()
byText = br.ReadBytes(60)
chText = encASCII.GetChars(byText)
The problem is that the "Text" field has some funky characters used for padding. Mostly seems to be 0x00 null characters.
Is there any way to get rid of these 0x00 characters by some Encoding?
Otherwise, how can I do a replace on the chText array to get rid of the 0x00 characters? I am trying to serialize the resulting datatable to XML and it is failing on these non compliant characters. I am able to loop through the array, however I can not figure out how to do the replace?
UPDATE:
This is where I am at with a lot of help from guys/gals below. The first solutions works, however not as flexible as I hoped, the second one fails for one use case, however is much more generic.
Ad 1) I can solve the issue by passing the string to this subroutine
Public Function StripBad(ByVal InString As String) As String
Dim str As String = InString
Dim sb As New System.Text.StringBuilder
strNew = strNew.Replace(chBad, " ")
For Each ch As Char In str
If StrComp(ChrW(Val("&H25")), ch) >= 0 Then
ch = " "
End If
sb.Append(ch)
Next
Return sb.ToString()
End Function
Ad 2) This routine does takes out several offending characters, however fails for 0x00. This was adapted from MSDN, http://msdn.microsoft.com/en-us/library/kdcak6ye.aspx.
Public Function StripBadwithConvert(ByVal InString As String) As String
Dim unicodeString As String
unicodeString = InString
' Create two different encodings.
Dim ascii As Encoding = Encoding.ASCII
Dim [unicode] As Encoding = Encoding.UTF8
' Convert the string into a byte[].
Dim unicodeBytes As Byte() = [unicode].GetBytes(unicodeString)
Dim asciiBytes As Byte() = Encoding.Convert([unicode], ascii, unicodeBytes)
Dim asciiChars(ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length) - 1) As Char
ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0)
Dim asciiString As New String(asciiChars)
Return asciiString
End Function
Upvotes: 1
Views: 4479
Reputation: 700152
First of all you should find out what the format for the text is, so that you are just blindly removing something without knowing what you hit.
Depending on the format, you use different methods to remove the characters.
To remove only the zero characters:
Dim len As Integer = 0
For pos As Integer = 0 To byText.Length - 1
If byText(pos) <> 0 Then
byText(len) = byText(pos)
len += 1
End If
Next
strText = Encoding.ASCII.GetChars(byText, 0, len)
To remove everything from the first zero character to the end of the array:
Dim len As Integer
While len < byText.Length AndAlso byText(len) <> 0
len += 1
End While
strText = Encoding.ASCII.GetChars(byText, 0, len)
Edit:
If you just want to keep any junk that happens to be ASCII characters:
Dim len As Integer = 0
For pos As Integer = 0 To byText.Length - 1
If byText(pos) >= 32 And byText(pos) <= 127 Then
byText(len) = byText(pos)
len += 1
End If
Next
strText = Encoding.ASCII.GetChars(byText, 0, len)
Upvotes: 3
Reputation: 11740
You can use a struct to load the data:
[System.Runtime.InteropServices.StructLayout(System.Runtime.InteropServices.LayoutKind.Explicit)]
internal struct TextFileRecord
{
[System.Runtime.InteropServices.FieldOffset(0)]
public byte Category;
[System.Runtime.InteropServices.FieldOffset( 1 )]
public byte Code;
[System.Runtime.InteropServices.FieldOffset( 2 )]
[System.Runtime.InteropServices.MarshalAs(System.Runtime.InteropServices.UnmanagedType.LPTStr, SizeConst=60)]
public string Text;
}
You have to adjust the UnmanagedType-Argument to fit with your string encoding.
Upvotes: 0
Reputation: 545488
If the null characters are used as right padding (i.e. terminating) the text, which would be the normal case, this is fairly easy:
Dim strText As String = encASCII.GetString(byText)
Dim strlen As Integer = strText.IndexOf(Chr(0))
If strlen <> -1 Then
strText = strText.Substr(0, strlen - 1)
End If
If not, you can still do a normal Replace
on the string. It would be slightly “cleaner” if you did the pruning in the byte array, before converting it to a string. The principle remains the same, though.
Dim strlen As Integer = Array.IndexOf(byText, 0)
If strlen = -1 Then
strlen = byText.Length + 1
End If
Dim strText = encASCII.GetString(byText, 0, strlen - 1)
Upvotes: 0