Extract a certain part of HTML with XPath and HTMLAbilityPack

Question

I am having an issue with XPath syntax as I dont understand how to use it to extract certain HTML statements. I am trying to load a videos information from a channel page; http://www.youtube.com/user/CinemaSins/videos

I know there is a line that holds all the details from views, title, ID, ect.

Here is what I am trying to get from within the html: enter image description here

Thats line 2836;

I'm not sure how, But I have HTML Ability Pack added as a resouce and have started attempts on getting it. Can someone explain how to get all of those details and the XPath syntax involved?

What I have attemped:

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='yt-lockup clearfix  yt-lockup-video yt-lockup-grid context-data-item']//a"))
            {
                if (node.ChildNodes[0].InnerHtml != String.Empty)
                {
                    title.Add(node.ChildNodes[0].InnerHtml);
                }
            }

^ The above code works in only getting the title of each video. But it also has a blank input aswell. Code executed and result is below.

enter image description here

Sam Clark-Ash · Accepted Answer

Seems the answer given to me did not help what so ever so after HEAPS of digging, I finally understand how XPath works and managed to do it myself as seen below;

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='yt-lockup clearfix  yt-lockup-video yt-lockup-grid context-data-item']"))
            {
                String val = node.Attributes["data-context-item-id"].Value;
                videoid.Add(val);
            }

I just had to grab the content within the class. Knowing this made it alot easier to use.

Extract a certain part of HTML with XPath and HTMLAbilityPack

Answers (2)

Related Questions