Reputation: 3880
I have come across something strange and I'd like your opinion.
There is a webpage which contains a span
element with some greek text in the InnerText
and InnerHtml
attributes.
The encoding of the page is Greek(Windows).
My if
statement is:
if (mySpan != null && mySpan.InnerText.Contains(greekText))
This line works 100%, but my previous non-working code was:
if (mySpan != null && browser.DocumentText.Contains(greekText))
This line did not work, and when I clicked on the preview withing the debugger I noticed that the greek text was unreadable (strange symbols instead of greek characters). However, all of other elements that contained greek text were successfully read by the application, that is I could save their attributes in variables and use them. Is there any explanation why DocumentText
failed and InnerText
succeeded?
Upvotes: 2
Views: 4337
Reputation: 17719
Looking at the source for WebBrowser.DocumentText
it would appear it uses UTF8 Encoding by default:
public string DocumentText
{
get
{
Stream documentStream = this.DocumentStream;
if (documentStream == null)
return "";
StreamReader streamReader = new StreamReader(documentStream);
documentStream.Position = 0L;
return streamReader.ReadToEnd();
}
That is, using a StreamReader
without specifying an encoding will assume UTF8 Encoding.
See this link for getting around this issue
I can only assume using browser.Document.GetElementById(mySpanId)
respects the stated encoding of the page which is why you see it correctly when using this call.
Upvotes: 2