Dr.Ripper
Dr.Ripper

Reputation: 89

Selenium.NoSuchElementException with dynamic tables

Please help me with this problem!

At the moment, I am scraping a website using the Selenium Firefox driver in C#. The data on this website however is dynamically filled for tables that cover data concerning future dates.

While the structure of the table is exactly the same for both future and past dates, the tables that are being updated during my selenium call throw a "NoSuchElementException" concerning IWebElements that are clearly there.

These are the relevant copied XPaths from the tables. One from a past date, on which it works perfectly fine, and one on a future date, on which the exception is thrown. As you can see, they are identical.

XPath 18052015

/html/body/div[1]/div/div[2]/div[5]/div[1]/div/div[1]/div[2]/div[1]/div[7]/div[1]/table/tbody/tr[1]/td[1]/div/a[2]

XPath 05022016

/html/body/div[1]/div/div[2]/div[5]/div[1]/div/div[1]/div[2]/div[1]/div[7]/div[1]/table/tbody/tr[1]/td[1]/div/a[2]

Using the FindElements(By.XPath(...)) function, I use two foreach loops to go through the highlighted tr's and the td's in the Xpath to get some text in the a[2] header. In both cases, the DOM in FireFox Firebug seems to be identical in both cases. The only difference I have observed between the two tables is the fact that every few seconds, the one concerning the future date updates its values (also resetting the table when looked at via firebug). Here you have a relevant piece of code, with an important comment.

            foreach (var tr in table.FindElements(By.XPath("div/table/tbody/tr")))
            {
                foreach (var td in tr.FindElements(By.XPath("td")))
                {
                    if(td.GetAttribute("innerHTML").Contains("some stuff"))
                    {
                        // This part is always reached, so condition is satisfied. > x is the relevant value, it is assigned the proper value when the error is thrown, but it still throws an exception.
                        x = td.FindElement(By.XPath("div/a[2]")).GetAttribute("href").Split('/')[4];
                        bmID = getBookmakerID(bmName);
                    }
                    if(td.GetAttribute("class").Contains("some other stuff"))
                    {

                    }
                }

Has any of you had similar problems before and were you able to solve them?

Upvotes: 1

Views: 810

Answers (2)

Dr.Ripper
Dr.Ripper

Reputation: 89

Thank you very much for helping. @ Buaban - I have added the waits, but I am afraid that didn't change much. It did make the algorithm go further, but eventually it broke down.

In the end, we solved it by using a combination of Selenium webdriver and the HTMLAgilityPack. As the code is too specific to actually post (and I don't have it available at the moment), I will share with you the main philosphy... which is short:

Use Selenium Webdriver to open and navigate the browser, e.g. doing actions as

Use HTMLAgilityPack to browse and rip the defined web ellement (WE)

Concluding, this approach of handling self-refreshing pages has proven to be extremely stable (it hasn't failed one time so far), extremely fast (due to parsing the HTML as a string) and flexible (as it uses specialiced packages to both navigate and rip data from the browser).

Happy coding!

Upvotes: 2

Buaban
Buaban

Reputation: 5137

Could you add Wait to every steps you call FindElement? See example below:

IWait<IWebElement> wait = new DefaultWait<IWebElement>(table);
wait.Timeout = TimeSpan.FromSeconds(5);
wait.PollingInterval = TimeSpan.FromMilliseconds(300);
By locator = By.XPath("div/table/tbody/tr");
ReadOnlyCollection<IWebElement> rows;

wait.Until(e => e.FindElements(locator).Count > 0);
rows = table.FindElements(locator);


foreach (var tr in rows)
{

    wait = new DefaultWait<IWebElement>(tr);
    wait.Timeout = TimeSpan.FromSeconds(5);
    wait.PollingInterval = TimeSpan.FromMilliseconds(300);
    locator = By.XPath("td");
    ReadOnlyCollection<IWebElement> cells;

    wait.Until(e => e.FindElements(locator).Count > 0);
    cells = tr.FindElements(locator);

    foreach (var td in cells)
    {
        if (td.GetAttribute("innerHTML").Contains("some stuff"))
        {
            // This part is always reached, so condition is satisfied. > x is the relevant value, it is assigned the proper value when the error is thrown, but it still throws an exception.
            wait = new DefaultWait<IWebElement>(td);
            wait.Timeout = TimeSpan.FromSeconds(5);
            wait.PollingInterval = TimeSpan.FromMilliseconds(300);
            locator = By.XPath("div/a[2]");
            IWebElement link2;

            wait.Until(e => e.FindElements(locator).Count > 0);
            try
            {
                link2 = td.FindElement(locator);
            }
            catch (NoSuchElementException ex)
            {
                throw new NoSuchElementException("Unable to find element, locator: \"" + locator.ToString() + "\".");
            }
            x = link2.GetAttribute("href").Split('/')[4];
            bmID = getBookmakerID(bmName);
        }
        if (td.GetAttribute("class").Contains("some other stuff"))
        {

        }
    }
}

If it is still error, you can debug the test in Visual Studio easily.

Upvotes: 2

Related Questions