Reputation: 53
This is my sample page. I want to get all inner texts of a tags to one string. I wrote code for that but it doesn't work correctly
<body>
<div id="infor">
<div id="genres">
<a href="#" >Animation</a>
<a href="#" >Short</a>
<a href="#" >Action</a>
</div>
</div>
</body>
I want to get inner text of the All tag to one string, I used this code to do that, but it doesn't work correctly.
class Values
{
private HtmlAgilityPack.HtmlDocument _markup;
HtmlWeb web = new HtmlWeb(); //creating object of HtmlWeb
form1 frm = new form1;
_markup = web.Load("mypage.html"); // load page
public string Genres
{
get
{
HtmlNodeCollection headers = _markup.DocumentNode.SelectNodes("//div[contains(@id, 'infor')]/a"); // I filter all of <a> tags in <div id="infor">
if (headers != null)
{
string genres = "";
foreach (HtmlNode header in headers) // I'm not sure what happens here.
{
HtmlNode genre = header.ParentNode.SelectSingleNode(".//a[contains(@href, '#')]"); //I think an error occurred in here...
if (genre != null)
{
genres += genre.InnerText + ", ";
}
}
return genres;
}
return String.Empty;
}
}
frm.text1.text=Genres;
}
text1 (return value) is:
Animation, Animation, Animation,
But I want output like this:
Animation, Short, Action,
Upvotes: 3
Views: 537
Reputation: 12768
A little Linq and using Descendants will get you there easier, I think.
var genreNode = _markup.DocumentNode.Descendants("div").Where(n => n.Id.Equals("genre")).FirstOrDefault();
if (genreNode != null)
{
// this pulls all <a> nodes under the genre div and pops their inner text into an array
// then joins that array using the ", " as separator.
return string.Join(", ", genreNode.Descendants("a")
.Where(n => n.GetAttributeValue("href", string.Empty).Equals("#"))
.Select(n => n.InnerText).ToArray());
}
Upvotes: 1
Reputation: 43743
It looks like your problem is the header.ParentNode.SelectSingleNode(".//a[contains(@href, '#')]")
statement. It's taking you back up to the parent div
element, and then finding the first a
element that matches the criteria (which is always the same one). You already have the a
node, so you could just check it's attributes via its properties rather than doing another select. However, it's silly to do a second select when you could just do a single select that narrows it down in the first place, such as:
HtmlNodeCollection headers = _markup.DocumentNode.SelectNodes("//div[contains(@id, 'infor')]/a[contains(@href, '#')]");
if (headers != null)
{
string genres = "";
foreach (HtmlNode header in headers) // i not sure what happens here.
{
genres += header.InnerText + ", ";
}
return genres;
}
Upvotes: 1