nanonerd
nanonerd

Reputation: 1984

How to handle weird unicode characters

NSTR 2009-A – Underlying got a $1.3MM ($91.3MM remains).  C/E rose to 67.1%

Below is the image of the above text in Notepad++ with Encode in UTF-8 turned on. The 'x96' is a dash and the 'xA0' are spaces. SQL Server gives Invalid Character error. How do I get rid of these @#$#? It's causing me a huge headache trying to fix ... ;-x

enter image description here

I tried below. It kept the dash but changed the 'xA0' to question marks:

byte[] tempBytes;
tempBytes = System.Text.Encoding.GetEncoding("ISO-8859-8").GetBytes(notesXML);
string notesXML = System.Text.Encoding.UTF8.GetString(tempBytes);

Tips appreciated, thanks !

Upvotes: 1

Views: 2497

Answers (1)

TachyonVortex
TachyonVortex

Reputation: 8572

It looks like the encoding of your original text could be Windows 1252:

96 = U+2013 : EN DASH
A0 = U+00A0 : NO-BREAK SPACE

So using System.Text.Encoding.GetEncoding("Windows-1252"), you should be able to read your original text without corrupting it, and you can then convert it to whatever encoding is being used by your database (eg: utf-8).

Upvotes: 2

Related Questions