user3783579
user3783579

Reputation: 17

htmlagilitypack extracting emails

I am executing the following code to extract all the links of the page using htmlagilitypack. When I enter the URL https://htmlagilitypack.codeplex.com/ I don't get any error and the code works fine. The URLs are also extracted and displayed well. But if I enter any other URL like https://htmlagilitypack.codeplex.com/discussions/12447 , then I get the following error "Object reference not set to an instance of an object". I am getting error in this line

OutputLabel.Text += counter + ". " + aTag.InnerHtml + " - " + 
                    aTag.Attributes["href"].Value + "\t" + "<br />"; 

Please help me out. It may be minor mistake for you but Please dont mark it negative.

var getHtmlWeb = new HtmlWeb();
var document = getHtmlWeb.Load(InputTextBox.Text);
var aTags = document.DocumentNode.SelectNodes("//a");
int counter = 1;

if (aTags != null)
{
    foreach (var aTag in aTags)
    {
        OutputLabel.Text += counter + ". " + aTag.InnerHtml + " - " + 
                            aTag.Attributes["href"].Value + "\t" + "<br />"; 
        counter++;
    }
}

Upvotes: 1

Views: 489

Answers (1)

Sergey Berezovskiy
Sergey Berezovskiy

Reputation: 236268

Looks like some of anchors does not have href attribute. E.g. in given page there is anchor:

<a name="post40566"></a>

So, aTag.Attributes["href"] returns null and you have an exception when you are trying to get this attribute value. You can change XPath to select only those anchors which have this attribute:

document.DocumentNode.SelectNodes("//a[@href]");

Or verify if attribute exists before accessing its value:

if (aTag.Attributes["href"] != null)
    // ...

Third option is usage of GetAttributeValue method and provide some default value which would be displayed for missing attributes:

aTag.GetAttributeValue("href", "N/A")

Upvotes: 4

Related Questions