Xpath Web scrape

Question

I'm random

I'm trying to scrape the above link attribute using xpath, the link text "I'm random" is always changing. The rest remains the same. The "I'm random" text is what I'm looking to scrape.

I don't really understand xpath, How would I pull just the inner text? I have tried:

string html = Web.ExecuteJavascriptWithResult("document.getElementsByTagName('html')[0].innerHTML");
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(html);
var Attributes = new List();
var Randomtxt = htmlDoc.DocumentNode.SelectNodes("//a[‌@href]");
if (Randomtxt != null)
{
    foreach (var contents in Randomtxt)
    {
        string href = contents.InnerHtml;
        var parts = href.Split(new char[] { '=' }, StringSplitOptions.RemoveEmptyEntries);
        if (parts.Length > 1)
        {
            Attributes.Add(parts[1]);
        }
    }
    Attribute.DataSource = Attributes;
}

But it returns nothing at all. How would I go about getting just the inner text.

user3674312 · Accepted Answer

Not xpath but this works for what I wanna do, problem solved.

    List Attributes = new List();
    string html = Web.ExecuteJavascriptWithResult("document.getElementsByTagName('html')[0].innerHTML");
    MatchCollection m1 = Regex.Matches(html, @"data-executing=\s*(.+?)\s*/a>", RegexOptions.Singleline);

    foreach (Match m in m1)
     {
      string new = m.Groups[1].Value;
      Attributes.Add(new);
     }
    Attribute.DataSource = Attributes;

Xpath Web scrape

Answers (2)

Related Questions