Metro1337
Metro1337

Reputation: 157

Can't find node using HTMLAgilityPack

I have used the code sample from following video: https://youtu.be/8e3Wklc1H_A

The code looks like this

var webGet = new HtmlWeb();
var doc = webGet.Load("http://pastebin.com/raw.php?i=gF0DG08s");

HtmlNode OurNone = doc.DocumentNode.SelectSingleNode("//div[@id='footertext']");

if (OurNone != null)
    richTextBox1.Text = OurNone.InnerHtml;
else
    richTextBox1.Text = "nothing found";

I thought at first that the original website might be down already (www.fuchsonline.com) so I quickly made a HTML which has only a footer in it and pasted it on Pastebin (link in code above)

<html>
<body>

<div id="footertext">
                 <p>
                     Copyright &copy; FUCHS Online Ltd, 2013. All Rights Reserved.
                 </p>
</div>

</body>
</html>

When using the Pastebin link in the code the program always writes "nothing found" into the richTextBox. However, the website used in the video is still up so I tried using the website in the webGet and voila - it works.

Now I'd like to ask what exactly is wrong with each of the codes. Is the HTML missing something or is the program only made for complete websites and if yes, what does make a website complete?

Upvotes: 1

Views: 980

Answers (2)

Steve Wellens
Steve Wellens

Reputation: 20620

Here is a simpler way:

WebClient webClient = new WebClient();
string htmlCode = webClient.DownloadString("http://pastebin.com/raw.php?i=gF0DG08s");

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);

HtmlNode OurNone = doc.DocumentNode.SelectSingleNode("//div[@id='footertext']");

if (OurNone != null)
    richTextBox1.Text = OurNone.InnerHtml;
else
    richTextBox1.Text = "nothing found";

Upvotes: 2

inspiredcoder
inspiredcoder

Reputation: 165

In this instance you are simply saving raw html to the this page as string which is why it is returning empty. If you really wanted to parse this with HTML agility pack you could first download the page, grab the raw HTML, and parse it into the agility pack's document model.

        WebRequest webRequest = HttpWebRequest.Create("http://pastebin.com/raw.php?i=gF0DG08s");
        webRequest.Method = "GET";
        string pageSource;
        using (StreamReader reader = new StreamReader(webRequest.GetResponse().GetResponseStream()))
        {
            pageSource = reader.ReadToEnd();
            HtmlDocument html = new HtmlDocument();
            html.LoadHtml(pageSource);
            HtmlNode OurNone = html.DocumentNode.SelectSingleNode("//div[@id='footertext']");
            if (OurNone != null)
            {
                richTextBox1.Text = OurNone.InnerHtml;
            }
            else
            {
                richTextBox1.Text = "nothing found";
            }
        } 

Upvotes: 1

Related Questions