hKumudu
hKumudu

Reputation: 53

How to get Innertexts of multiple <a> tags?

This is my sample page. I want to get all inner texts of a tags to one string. I wrote code for that but it doesn't work correctly

<body>
    <div id="infor">
        <div id="genres">
            <a href="#" >Animation</a>
            <a href="#" >Short</a>
            <a href="#" >Action</a>
        </div>
    </div>
</body>

I want to get inner text of the All tag to one string, I used this code to do that, but it doesn't work correctly.

class Values
{
    private HtmlAgilityPack.HtmlDocument _markup;

    HtmlWeb web = new HtmlWeb(); //creating object of HtmlWeb
    form1 frm = new form1;

    _markup = web.Load("mypage.html"); // load page

    public string Genres
    {
        get
        {
            HtmlNodeCollection headers = _markup.DocumentNode.SelectNodes("//div[contains(@id, 'infor')]/a"); // I filter all of <a> tags in <div id="infor">
            if (headers != null)
            {
                string genres = "";
                foreach (HtmlNode header in headers) // I'm not sure what happens here. 
                {
                    HtmlNode genre = header.ParentNode.SelectSingleNode(".//a[contains(@href, '#')]"); //I think an error occurred in here... 
                    if (genre != null)
                    {
                        genres += genre.InnerText + ", ";
                    }
                }
                return genres;
            }
            return String.Empty;
        }
    }

    frm.text1.text=Genres;
}

text1 (return value) is:

Animation, Animation, Animation,

But I want output like this:

Animation, Short, Action,

Upvotes: 3

Views: 537

Answers (2)

Jacob Proffitt
Jacob Proffitt

Reputation: 12768

A little Linq and using Descendants will get you there easier, I think.

var genreNode = _markup.DocumentNode.Descendants("div").Where(n => n.Id.Equals("genre")).FirstOrDefault();
if (genreNode != null)
{
    // this pulls all <a> nodes under the genre div and pops their inner text into an array
    // then joins that array using the ", " as separator.
    return string.Join(", ", genreNode.Descendants("a")
        .Where(n => n.GetAttributeValue("href", string.Empty).Equals("#"))
        .Select(n => n.InnerText).ToArray());
}

Upvotes: 1

Steven Doggart
Steven Doggart

Reputation: 43743

It looks like your problem is the header.ParentNode.SelectSingleNode(".//a[contains(@href, '#')]") statement. It's taking you back up to the parent div element, and then finding the first a element that matches the criteria (which is always the same one). You already have the a node, so you could just check it's attributes via its properties rather than doing another select. However, it's silly to do a second select when you could just do a single select that narrows it down in the first place, such as:

HtmlNodeCollection headers = _markup.DocumentNode.SelectNodes("//div[contains(@id, 'infor')]/a[contains(@href, '#')]");
if (headers != null)
    {
    string genres = "";
    foreach (HtmlNode header in headers) // i not sure what happens here. 
        {
        genres += header.InnerText + ", ";
        }
    return genres;
    }

Upvotes: 1

Related Questions