Reputation: 10012
We store data as BLOBS in a database (ugh I know) now on my website I'm retreiving the data, putting into bytes then converting to a string to display. However as you can see below I'm getting weird characters in the text when viewing in debug mode.
Hi John
�
I look forward to receipt of your instructions in due course.
�
Kind regards
�
When it renders it shows like
Hi John�I look forward to receipt of your instructions in due course.�Kind regards�
Currently the code is:
Dim RSFileNote As New ADODB.Recordset
RSFileNote.Fields.Append("FileNote", 205, intSizeofBlob)
RSFileNote.Open()
RSFileNote.AddNew()
For n As Integer = 0 To dsVecSegment.Tables(0).Rows.Count - 1
RSFileNote("FileNote").AppendChunk(dsVecSegment.Tables(0).Rows(n).Item("SDATA"))
Next
RSFileNote.Update()
Dim vOut As String = System.Text.Encoding.UTF8.GetString(RSFileNote("FileNote").Value)
I would of thought the UTF8 encoding would resolve this issue, but does anyone know what I can do to resolve the issue on my side? (as getting the content in the database correct isn't an option)
Ideally I want to remove extraneous characters and replace the line breaks (that are in the .Value during debug) with line break that actually work.
Update
I think the issue lays with the fact emails are copy & pasted into the initial input field to store in the database. So they are carrying over artifacts from outlook into the field.
Update 2
Having taken Esailija answer into account it has removed the � icons, however the break lines are still mysteriously going missing.
I would post a full output however it contains private data, though with emails that have been pasted in the end of it is encoded with:
,wd-s.@ÓyøYð&¥¥ÀAàA•F• € p IØ%Ð`ÐîèØMà!µì$ô#i!°p1¤ Ið-œ)) -„U€. x.y.)¨}U¹ M½!;¹4%;¨5˜6)˜2YA'8<1<8<9•=; !:$Ì78è# Ùœ<ÐNÌ'Á',A yGÅC ±]Õ 1 õH¥Ve„8¥9dN¹FMX hX`Kè¸XÍ”U”dnÕU-€W@U`N%PDE
Upvotes: 2
Views: 484
Reputation: 140230
The unicode replacement character (�
) indicates an error when decoding a byte sequence, that the byte sequence is not valid in the chosen UTF encoding, in this case UTF-8. So any invalid UTF-8 sequences are replaced with the replacement character in the result. It can also be used literally as a normal character, but this doesn't seem to be the case here.
The reason is most likely that the encoding is not UTF-8. Without seeing the raw bytes, my best guess is that it's actually in CP1252.
So try this:
Dim enc As Encoding = Encoding.GetEncoding(1252)
Dim vOut As String = enc.GetString(RSFileNote("FileNote").Value)
Also comment what the result is in 1252, because the raw bytes can usually be deduced from that.
Upvotes: 2
Reputation: 6554
Nasty fix but you could do this vOut = vOut.Replace("�", vbCrLf)
Upvotes: 2