m44m31
m44m31

Reputation: 113

GetElementsByTagName in C#

I have this piece of code:

string x = textBox1.Text;
string[] list = x.Split(';');
foreach (string u in list)
{
    string url = "http://*********/index.php?n=" + u;
    webBrowser1.Navigate(url);
    webBrowser1.Document.GetElementsByTagName("META");
}

and I'm trying to get the <META> tags to output to a message box, but when I test it out, I keep getting this error:

Object reference not set to an instance of an object.

Upvotes: 0

Views: 9487

Answers (3)

Luciano Carvalho
Luciano Carvalho

Reputation: 1739

You can retrieve META tags and any other HTML element directly from your WebBrowser control, there is no need of HTML Agility Pack or other component.

Like Mark said, wait first for the DocumentCompleted event:

webBrowser.DocumentCompleted += WebBrowser_DocumentCompleted;

Then you can catch any element and content from the HTML document. The following code gets the title and the meta description:

private void WebBrowser_DocumentCompleted(object sender, System.Windows.Forms.WebBrowserDocumentCompletedEventArgs e)
{
        System.Windows.Forms.WebBrowser browser = sender as System.Windows.Forms.WebBrowser;
        string title = browser.Document.Title;
        string description = String.Empty;
        foreach (HtmlElement meta in browser.Document.GetElementsByTagName("META"))
        {
            if (meta.Name.ToLower() == "description")
            {
                description = meta.GetAttribute("content");
            }
        }
}

Upvotes: 0

Ry-
Ry-

Reputation: 225272

Your problem is that you're accessing the Document object before the document has loaded - WebBrowsers are asynchronous. Just parse the HTML using a library like the HTML Agility Pack.

Here's how you might get the <meta> tags using the HTML Agility Pack. (Assumes using System.Net; and using HtmlAgilityPack;.)

// Create a WebClient to use to download the string:
using(WebClient wc = new WebClient()) {
    // Create a document object
    HtmlDocument d = new HtmlDocument();

    // Download the content and parse the HTML:        
    d.LoadHtml(wc.DownloadString("http://stackoverflow.com/questions/10368605/getelementsbytagname-in-c-sharp/10368631#10368631"));

    // Loop through all the <meta> tags:
    foreach(HtmlNode metaTag in d.DocumentNode.Descendants("meta")) {
        // It's a <meta> tag! Do something with it.
    }
}

Upvotes: 3

Mark Byers
Mark Byers

Reputation: 839144

You shouldn't try to access the document until it has finish loading. Run that code inside a handler for the DocumentCompleted event.

But Matti is right. If all you need is to read the HTML you shouldn't be using a WebBrowser. Just fetch the text and parse it using an HTML parser.

Upvotes: 2

Related Questions