Reputation: 781
I'm trying to use the HTMLAgilityPack to retrieve various specific values from a web page. The web page is always the same an the data I want to scrape from it is always in the same place (same divs/classes/attributes etc).
I've tried to loop through and get the values, but I always mess up somewhere. I'd provide some code to help but honestly I've tried 5 times and each time I don't get results close to what I want to - I'm well and truly in a pickle.
I have written the main chunk of HTML:
<div id ="markers">
<div class="row">
<div class="span2 filter-pane ">
<div class="teaser teaser-small">
<h1 class="teaser-title">
<a href="#map" data-lat="Value1" data-lng="Value2" data-name="Value3">...</a>
</div>
<p> Value4 </p>
</div>
</div>
<div class="span2 filter-pane ">
</div>
<div class="span2 filter-pane ">
</div>
</div>
<div class="row"></div>
<div class="row"></div>
</div>
Basically the values (1-4) are the values I want to extract from the data.
The <div id="markers">
is ONE div on the page, all the information I need is in this div.
There are multiple <div class="row">
divs, I need to loop through all of these.
Inside each of these divs, there are three or less <div class="span2 filter-pane ">
divs. I need to loop through these 3 divs also.
My data is inside here - Value3 is here in the <p>...</p>
. And the other values can be found within the <h1 class="teaser-title">
node, where they are attributes in an <a>
element.
I hope somebody can provide me with a solution, or at least some good guidance to accessing all pieces of data I want. I've tried various things but I don't get the results I want.
Thanks.
Upvotes: 0
Views: 907
Reputation: 3208
Here are some hints for you. So first you need to get div#markers
because you mentioned that it contains all your info you need.
string mainURL = your url;
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(mainURL);
var markerDiv = doc.DocumentNode.Descendants("div").FirstOrDefault(n => n.Id.Equal("markers');
//Check if marketDiv is null or not
//Same idea, get list of row divs
var rows = marketDiv.Descendants("div").HasClass("row") //I will provide .HasClass function or you can write your own, it's simple;
//Iterate throw your rows object
//for each row object
var aElement = row.Descendants("a").FirstOrDefault()//you can have more criteria here if it has more than 1 a element
aElement.GetAttributeValue("data-lat", "") //will return Value1 here, do the same thing for other attributes and p.
Hope it helps
Upvotes: 1