kpollock
kpollock

Reputation: 3989

'strange characters' working fine on SQL server and not via Csv

Please excuse the odd title, but I don't know how better to describe my issue.

Our SQL server (2008) database has, quite legitimately, data in textfields that looks like

"Microsoft XML ·ÖÎö³ÌÐòºÍ SDK (Unknown)"

I am reading data in from CSV files in C# which has the same sort of data. We are using the LumenWorks.Framework.IO.Csv CsvReader (because we sometimes need to deal with really big files). We have the source code for this.

These fields look fine (i.e. as above) in the CSV file itself, but when the data is read in from the csv it ends up represented as

'Microsoft XML ��������� SDK (Unknown)'

Which is wrong, and (obviously) does not find a match when using it in queries back to the database. I can query fine using the original string in SMSS.

I hampered in web searching because I struggle to find the correct terms to look for the issue!

Can anyone explain this issue in the proper terms and maybe have ideas as to what sort of things should I be looking for in the CsvReader code (or ours) that might cause this mistranslation?

Upvotes: 1

Views: 990

Answers (1)

LukeH
LukeH

Reputation: 269628

I suspect that you need to specify the encoding of your CSV file.

If you're currently doing something like this:

using (var csv = new CsvReader(new StreamReader("foo.csv"), true))
{
    // ...
}

...then try something like this instead:

using (var csv = new CsvReader(new StreamReader("foo.csv", Encoding.Unicode), true))
{
    // ...
}

Note that I don't know what encoding you'll need to use. I've used Unicode as an example. (I think the default encoding for StreamReader is UTF8 if you don't specify.)

Upvotes: 2

Related Questions