TomatoLion
TomatoLion

Reputation: 81

Node is NULL using Xpath and HtmlAgilityPack

I've wrote a grabber for imdb web-site and now I need to parse the pages. I'm going to do it with HtmlAgilityPack.

For example, I've downloaded this page: link to IMDb

and I've saved it as @"D:\IMDb.htm" From this page I need to take the line, where the usefulness of the review is specified, e.g. 1770 out of 2062 people found the following review useful: from the first review.

My code is next, I hope the Xpath is correct, but my Node is NULL in the end(

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using HtmlAgilityPack;


static void Main(string[] args)
{
    var doc = new HtmlDocument();
    doc.LoadHtml("D:\\IMDb.htm");
    Console.WriteLine("res", GetDescription("D:\\IMDb.htm"));
    Console.ReadLine();
}

public static string GetDescription(string html)
{
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();           
    doc.OptionFixNestedTags = true; 
    doc.Load(new StringReader(html));
    HtmlNode node = doc.DocumentNode.SelectSingleNode("//*[@id='tn15content']/div[1]/small[1]");
    return node.InnerHtml;
}

Hope to see some help from you, because I don't understand what's wrong..

Upvotes: 1

Views: 924

Answers (1)

har07
har07

Reputation: 89285

You shouldn't use StringReader here because html variable contains path to the HTML file to be loaded instead of the HTML markup it self :

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();           
doc.OptionFixNestedTags = true; 
doc.Load(html);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//*[@id='tn15content']/div[1]/small[1]");
return node.InnerHtml;

Even if html contains the markup you can use HAP's built-in function doc.LoadHtml(html).

Upvotes: 1

Related Questions