user3674312
user3674312

Reputation: 33

Xpath Web scrape

<a class="support" style="letter-spacing: -1px" href="/support/index.php?/Knowledgebase/List/updates" data-executing="0">I'm random</a>    

I'm trying to scrape the above link attribute using xpath, the link text "I'm random" is always changing. The rest remains the same. The "I'm random" text is what I'm looking to scrape.

I don't really understand xpath, How would I pull just the inner text? I have tried:

string html = Web.ExecuteJavascriptWithResult("document.getElementsByTagName('html')[0].innerHTML");
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(html);
var Attributes = new List<string>();
var Randomtxt = htmlDoc.DocumentNode.SelectNodes("//a[‌​@href]");
if (Randomtxt != null)
{
    foreach (var contents in Randomtxt)
    {
        string href = contents.InnerHtml;
        var parts = href.Split(new char[] { '=' }, StringSplitOptions.RemoveEmptyEntries);
        if (parts.Length > 1)
        {
            Attributes.Add(parts[1]);
        }
    }
    Attribute.DataSource = Attributes;
}    

But it returns nothing at all. How would I go about getting just the inner text.

Upvotes: 0

Views: 1198

Answers (2)

user3674312
user3674312

Reputation: 33

Not xpath but this works for what I wanna do, problem solved.

    List<string> Attributes = new List<string>();
    string html = Web.ExecuteJavascriptWithResult("document.getElementsByTagName('html')[0].innerHTML");
    MatchCollection m1 = Regex.Matches(html, @"data-executing=\s*(.+?)\s*/a>", RegexOptions.Singleline);

    foreach (Match m in m1)
     {
      string new = m.Groups[1].Value;
      Attributes.Add(new);
     }
    Attribute.DataSource = Attributes;

Upvotes: 1

SuncoastOwner
SuncoastOwner

Reputation: 263

first find the single node

var Randomtxt = htmlDoc.DocumentNode.SelectSingleNode("//*[‌​@class='support']");

then pull the inner text

string value = Randomtxt.Innertext;

Upvotes: 0

Related Questions