Reputation: 13
I read some posts about this theme, and i try to implement the answers, but i dont have the output that i want.
This is the code HTML
<div class="span-8">
<dl>
<dt>
<a title="A Coruña" href="http://www.paginasamarillas.es/all_a-coru%C3%B1a_.html"> A Coruña</a>
</dt>
<dt>
<a title="Álava" href="http://www.paginasamarillas.es/all_alava_.html"> Álava</a>
</dt>
<dt>
<a title="Albacete" href="http://www.paginasamarillas.es/all_albacete_.html"> Albacete</a>
</dt>
<dt>
<a title="Alicante" href="http://www.paginasamarillas.es/all_alicante_.html"> Alicante</a>
</dt>
...
...
And i want to get "Barcelona", "Alicante","Albacete", etc. So, I try the follow code:
var nodos = doc.DocumentNode.SelectNodes("//div[@class='container']");
and
var nodos = doc.DocumentNode.SelectNodes("//a[@title]");
or
var nodos = doc.DocumentNode.SelectNodes("//div[@class='span-8']");
But doesn't work, it's like if the class "container", the attribute "title" or class "span-8" don't exist in the page. Also try others variants. Exist others "div" with the class 'container', and others "a" with attribute 'title' in the code, that extract fine, but it's not what I want.
EDIT
Sory, I explain wrong. Is not a single word, is a group of data. I modify the HTML code of above.
Upvotes: 0
Views: 653
Reputation: 460138
I have tested your sample html and it works:
string html = @"<div class=""container"">
<div class=""span-24"">
<div class=""span-8"">
<dl>
<dt>
<a title=""A Coruña"" href=""http://www.example.com/all_example.html""> Barcelona</a>
</dt>
</dl>
</div>
</div>
</div>";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var div = doc.DocumentNode.SelectSingleNode("//div[@class='span-8']");
if(div != null)
{
List<string> linkTexts = div.Descendants("a")
.Select(a => a.InnerText)
.ToList(); // one item " Barcelona"
}
Upvotes: 1