Reputation: 81
I want to extract the text within the content attribute using X path.
<meta name="keywords" content="football,cricket,Rugby,Volleyball">
I want to select only "football,cricket,Rugby,Volleyball"
I'm using C#, htmlagilitypack.
this is how I supposed to do this.but it did not work.
private void scrapBtn_Click(object sender, EventArgs e)
{
string url = urlTextBox.Text;
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(url);
try
{
var node = doc.DocumentNode.SelectSingleNode("//head/title/text()");
var node1 = doc.DocumentNode.SelectSingleNode("//head/meta[@name='DESCRIPTION']/@content");
try
{
label4.Text = "Title:";
label4.Text += "\t"+node.Name.ToUpper() + ": " + node.OuterHtml;
}
catch (NullReferenceException)
{
MessageBox.Show(url + "does not contain <Title>", "Oppz, Sorry");
}
try
{
label4.Text += "\nMeta Keywords:";
label4.Text += "\n\t" + node1.Name.ToUpper() + ": " + node1.OuterHtml;
}
catch (NullReferenceException)
{
MessageBox.Show(url + "does not contain <meta='Keywords'>", "Oppz, Sorry");
}
}
catch(Exception ex){
MessageBox.Show(ex.StackTrace, "Oppz, Sorry");
}
}
Upvotes: 0
Views: 4601
Reputation: 167716
With HTML Agility Pack you can use doc.SelectSingleNode("/html/head/meta[@name = 'keywords']").Attributes["content"].Value
. I think their XPath support for attribute nodes is a bit odd so it is better to select the element and then use the Attributes
property to select the attribute and the Value
property to extract the value. If you want to use pure XPath to get the attribute value as a string then use doc.CreateNavigator().Evaluate("string(/html/head/meta[@name = 'keywords']/@content)")
.
Upvotes: 1
Reputation: 11741
You can use string() to get just the value:
string(//head/meta[@name]/@content/text())
Upvotes: 0