Tzuriel Yamin
Tzuriel Yamin

Reputation: 73

Error when try to load html with htmlagiltypack

I am trying to run this code

string path = "http://warisons.rssing.com/chan1729325/all_p43.html";
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(path);
var div = htmlDoc.DocumentNode.Descendants("div");
foreach (var x in div)
{
    Console.WriteLine(x.Attributes["class"].Value);
}

when I debug this code in htmlDoc.LoadHtml(path); I got this error

Locating source for 'd:\SVN_CHECKOUT\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs'. Checksum: MD5 {4e 14 d3 b d5 30 6e 2c bf 84 ab 8a 96 82 4a 8f} The file 'd:\SVN_CHECKOUT\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs' does not exist. Looking in script documents for 'd:\SVN_CHECKOUT\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs'... Looking in the projects for 'd:\SVN_CHECKOUT\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs'. The file was not found in a project. Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src\'... Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src\vccorlib\'... Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\atlmfc\src\mfc\'... Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\atlmfc\src\atl\'... Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\atlmfc\include'... The debug source files settings for the active solution indicate that the debugger will not ask the user to find the file: d:\SVN_CHECKOUT\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs. The debugger could not locate the source file 'd:\SVN_CHECKOUT\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs'.

Upvotes: 0

Views: 2554

Answers (1)

Andrey Korneyev
Andrey Korneyev

Reputation: 26846

Your attempt to load html document from URI is incorrect.

Methof HtmlDocument.LoadHtml loads html from string provided, so its argument is html text itself, not URI.

To load html from provided URI you need something like:

string path = "http://warisons.rssing.com/chan1729325/all_p43.html";
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlWeb().Load(path);

Also note you can get NullReferenceException here:

x.Attributes["class"].Value

since you're not checking if there is class attribute (x.Attributes["class"] != null) before accessing its value.

Upvotes: 1

Related Questions