Reputation: 33
<a class="support" style="letter-spacing: -1px" href="/support/index.php?/Knowledgebase/List/updates" data-executing="0">I'm random</a>
I'm trying to scrape the above link attribute using xpath, the link text "I'm random"
is always changing. The rest remains the same. The "I'm random"
text is what I'm looking to scrape.
I don't really understand xpath, How would I pull just the inner text? I have tried:
string html = Web.ExecuteJavascriptWithResult("document.getElementsByTagName('html')[0].innerHTML");
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(html);
var Attributes = new List<string>();
var Randomtxt = htmlDoc.DocumentNode.SelectNodes("//a[@href]");
if (Randomtxt != null)
{
foreach (var contents in Randomtxt)
{
string href = contents.InnerHtml;
var parts = href.Split(new char[] { '=' }, StringSplitOptions.RemoveEmptyEntries);
if (parts.Length > 1)
{
Attributes.Add(parts[1]);
}
}
Attribute.DataSource = Attributes;
}
But it returns nothing at all. How would I go about getting just the inner text.
Upvotes: 0
Views: 1198
Reputation: 33
Not xpath but this works for what I wanna do, problem solved.
List<string> Attributes = new List<string>();
string html = Web.ExecuteJavascriptWithResult("document.getElementsByTagName('html')[0].innerHTML");
MatchCollection m1 = Regex.Matches(html, @"data-executing=\s*(.+?)\s*/a>", RegexOptions.Singleline);
foreach (Match m in m1)
{
string new = m.Groups[1].Value;
Attributes.Add(new);
}
Attribute.DataSource = Attributes;
Upvotes: 1
Reputation: 263
first find the single node
var Randomtxt = htmlDoc.DocumentNode.SelectSingleNode("//*[@class='support']");
then pull the inner text
string value = Randomtxt.Innertext;
Upvotes: 0