Haim Kashi
Haim Kashi

Reputation: 399

Why when loading html file from the hard disk through WebBrowser not everything is shown?

I have this code I'm downloading an HTML file of a website and saving it to my hard disk. In the constructor:

var uri = new Uri("http://www.walla.co.il");

Then :

DownloadHtml();

private void DownloadHtml()
        {
            using (var client = new WebClient())
            {
                client.DownloadFile(webSite, OriginalHtmlFilePath);
            }
        }

Then after doing some thing with the downloaded html file im loading it with WebBrowser:

string html = File.ReadAllText(ScrambledHtmlFilePath);
webBrowser1.DocumentText = html;

If the website is for example http://www.cnn.com I load it with web browser no problems. If the site is for example http://www.walla.co.il When i load it with web browser some images and other stuff not show up.

And in both sites when i load them with the web broser im getting many script errors and i have to click many times on YES to keep loading the page.

Script Error

An error has occurred in the script on this page

line char ....

Do you want to continue running scripts on this page ?

Then i select yes and i do it untill the page is loaded. If the html file is cnn.com content after doing many YES the page is loaded good.

But if the html file is in hebrew for example walla.co.il after clicking many time on YES in the end i see:

enter image description here

And the original site is not like this at all.

Upvotes: 1

Views: 822

Answers (2)

Alireza Noori
Alireza Noori

Reputation: 15275

When you download the page, you download the source code of that page. However, when your browser downloads the page, there are a lot of stuff downloaded along with the HTML. For instance, there are JavaScript files, CSS files (for styling) and more. Even if you download them as well, you may need to modify your HTML to link those files to the HTML with a relative or absolute path (depending on your needs).

In other words, the web page is not shown as you expect because the attachments are not downloaded and linked to the HTML.

Update

When you set the source of the document, the attached documents (css, js, etc.) relative to the HTML are not found and therefore not used in the web page. For instance:

<link rel="stylesheet" type="text/css" href="//cdn.sstatic.net/stackoverflow/all.css?v=a25094f085c0">

Will work wherever it is loaded from but:

<link rel="stylesheet" type="text/css" href="/css/all.css">

won't.

Update 2

This is the exact reason that you get your script errors. The browser cannot find some (or most, or even all) the attached scripts (.js files in the <script> tag) and when it tries to run the JavaScript code, it fails.

Upvotes: 3

Joel Coehoorn
Joel Coehoorn

Reputation: 416131

The problem is relative vs absolute paths.

When a browser shows an HTML page, it also needs to retrieve things like images, css sheets, and javascript. It knows where to find those things because of instructions in the HTML file. Sometimes, the HTML instructions include a relative path, instead of an absolute path. When there is a relative path, with no additional hints about the original location of the page, the web browser must resort toe the current location or context of the page as the base to construct the full path for each of the relative items on the page. In this case, you just have a string variable, and so there is no context.

You can improve this adding a base tag to the head section of your downloaded html files, if one does not already exist, to indicate the original location of the page and help the browser know what to do with relative links.

Upvotes: 0

Related Questions