Hirantha
Hirantha

Reputation: 81

How to extract the text values of a given attribute using Xpath?

I want to extract the text within the content attribute using X path.

<meta name="keywords" content="football,cricket,Rugby,Volleyball">

I want to select only "football,cricket,Rugby,Volleyball"

I'm using C#, htmlagilitypack.

this is how I supposed to do this.but it did not work.

private void scrapBtn_Click(object sender, EventArgs e)
        {
            string url = urlTextBox.Text;
            HtmlWeb web = new HtmlWeb();
            HtmlAgilityPack.HtmlDocument doc = web.Load(url);


               try
            {
                var node = doc.DocumentNode.SelectSingleNode("//head/title/text()");
                var node1 = doc.DocumentNode.SelectSingleNode("//head/meta[@name='DESCRIPTION']/@content");

                try
                {
                    label4.Text = "Title:";
                    label4.Text += "\t"+node.Name.ToUpper() + ": " + node.OuterHtml;
                }
                catch (NullReferenceException)
                {
                    MessageBox.Show(url + "does not contain <Title>", "Oppz, Sorry");
                }

                try
                {
                    label4.Text += "\nMeta Keywords:";
                    label4.Text += "\n\t" + node1.Name.ToUpper() + ": " + node1.OuterHtml;
                }
                catch (NullReferenceException)
                {
                    MessageBox.Show(url + "does not contain <meta='Keywords'>", "Oppz, Sorry");
                }

            }
            catch(Exception ex){
                MessageBox.Show(ex.StackTrace, "Oppz, Sorry");
            }
        }

Upvotes: 0

Views: 4601

Answers (2)

Martin Honnen
Martin Honnen

Reputation: 167716

With HTML Agility Pack you can use doc.SelectSingleNode("/html/head/meta[@name = 'keywords']").Attributes["content"].Value. I think their XPath support for attribute nodes is a bit odd so it is better to select the element and then use the Attributes property to select the attribute and the Value property to extract the value. If you want to use pure XPath to get the attribute value as a string then use doc.CreateNavigator().Evaluate("string(/html/head/meta[@name = 'keywords']/@content)").

Upvotes: 1

Neel
Neel

Reputation: 11741

You can use string() to get just the value:

string(//head/meta[@name]/@content/text())

Upvotes: 0

Related Questions