Conor
Conor

Reputation: 781

Retrieving deep nested values looping through a HTML page using HTMLAgilityPack C#

I'm trying to use the HTMLAgilityPack to retrieve various specific values from a web page. The web page is always the same an the data I want to scrape from it is always in the same place (same divs/classes/attributes etc).

I've tried to loop through and get the values, but I always mess up somewhere. I'd provide some code to help but honestly I've tried 5 times and each time I don't get results close to what I want to - I'm well and truly in a pickle.

I have written the main chunk of HTML:

<div id ="markers">
   <div class="row">
      <div class="span2 filter-pane   ">
         <div class="teaser teaser-small">
            <h1 class="teaser-title">
            <a href="#map" data-lat="Value1" data-lng="Value2" data-name="Value3">...</a>
         </div>
         <p> Value4 </p>
      </div>
   </div>
   <div class="span2 filter-pane   ">
   </div>
   <div class="span2 filter-pane   ">
   </div>
</div>
<div class="row"></div>
<div class="row"></div>
</div>

Basically the values (1-4) are the values I want to extract from the data.

The <div id="markers"> is ONE div on the page, all the information I need is in this div.

There are multiple <div class="row"> divs, I need to loop through all of these.

Inside each of these divs, there are three or less <div class="span2 filter-pane "> divs. I need to loop through these 3 divs also.

My data is inside here - Value3 is here in the <p>...</p>. And the other values can be found within the <h1 class="teaser-title"> node, where they are attributes in an <a> element.

I hope somebody can provide me with a solution, or at least some good guidance to accessing all pieces of data I want. I've tried various things but I don't get the results I want.

Thanks.

Upvotes: 0

Views: 907

Answers (1)

Hung Cao
Hung Cao

Reputation: 3208

Here are some hints for you. So first you need to get div#markers because you mentioned that it contains all your info you need.

string mainURL = your url;
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(mainURL);
var markerDiv = doc.DocumentNode.Descendants("div").FirstOrDefault(n => n.Id.Equal("markers');
//Check if marketDiv is null or not
//Same idea, get list of row divs
var rows = marketDiv.Descendants("div").HasClass("row") //I will provide .HasClass function or you can write your own, it's simple;
//Iterate throw your rows object
//for each row object
var aElement = row.Descendants("a").FirstOrDefault()//you can have more criteria here if it has more than 1 a element
aElement.GetAttributeValue("data-lat", "") //will return Value1 here, do the same thing for other attributes and p.

Hope it helps

Upvotes: 1

Related Questions