Encoding problem with reading website, three different encodings

Question

I have a problem with a WebRequest in C#. It's a google page.

The header states

text/html; charset=ISO-8859-1

The website states

And finally I only get the expected Result in the debugger as well as regular expression, when I use Encoding.Default which defaults to System.Text.SBCSCodePageEncoding

Now what do I do? Do you have any hints, how this could happen or how I could solve this problem?

The actual Encoding of the page seems to be UTF-8. At least FF displays it correctly in UTF-8, not in Windows-Whatever and not in Latin1.

The URL is this

The problem is the €-sign as well as all German Umlauts.

Thanks in advance for your help on this problem which is making me seriously crazy!

Update: when I output the string via

// create a writer and open the file
TextWriter tw = new StreamWriter("test.txt");

// write a line of text to the file
tw.WriteLine(html);

// close the stream
tw.Close();

it works all fine.

So it seems the problem is, that the debugger does not show the correct encoding, and the Regular Expression also.

How do I tell C# to handle the RegEx as UTF-8?

Encoding problem with reading website, three different encodings

Answers (1)

Related Questions