user1018711
user1018711

Reputation:

Character Encoding in .NET

I have exported excel 2007 document as CSV (separated by semicolon). I am using CZECH office 2010 and czech windows 7.

When i read file in .net C#, text with special czech symbols is corrupted. It is when i am using

something like string[] lines = file.readalllines(path); (from System.IO.File)

So i guess i need to specially provide right encoding, right? so i tried:

string[] lines = File.ReadAllLines(path,encoding);

encoding variable was defined like

Encoding encoding = Encoding.UTF8 for example.

None of options worked. And strangest thing, some of them, like Encoding.Unicode even threw

IndexOutOfRandgeException

.

How should i fix this encoding problem? Thank you.

BTW, my office manages to open and read that document right way.

Upvotes: 0

Views: 7469

Answers (2)

RobV
RobV

Reputation: 28636

I seem to remember hitting this a couple of years ago with CSV exported from office excel

Googling on the web it seems that office will use different encodings depending on your version of office and you region.

In my case I believe the correct encoding was something weird like UTF7 (wtf) so try that. Otherwise you may be stuck trying every encoding until it decodes properly

The other option is to look for tools designed to detect the encoding of a file and run it over your input to determine the encoding

Upvotes: 1

Wiktor Zychla
Wiktor Zychla

Reputation: 48230

Most probably the encoding Excel writes your file is the default encoding of your system, which should be windows-1250. Either open your file with Encoding.Default or Encoding.GetEncoding("windows-1250"). It works for us here in Poland. I don't remember any issues regaring csvs exported from office.

Upvotes: 7

Related Questions