Muhammad Mateen
Muhammad Mateen

Reputation: 138

html agility pack is returning javascript code except the actual Html

i want to get the links using c# console from a website using html agility pack but there is java script code written in li and href tag why java script changes code on click i don't know please tell me the solution how t get actual code

<li onmouseover="activate_menu('top-menu-61', 61); void(0);" onmouseout="deactivate_menu('top-menu-61', 61);"><a href="javascript:void();

i can just see this in my li and a tag,how to resolve this and get actual html so i can get links furthur

Upvotes: 0

Views: 1058

Answers (1)

har07
har07

Reputation: 89285

Try using browser automation tools like Selenium WebDriver to generate a webpage fully, utilizing a real browser, before passing it to HtmlAgilityPack for parsing. Using Selenium should be fairly easy as exemplified below. You only need to make sure that all the needed tools (Selenium library and browser driver of choice) are installed properly beforehand :

// Initialize the Chrome Driver (or any other supported browser)
using (var driver = new ChromeDriver())
{
    // open the target page
    driver.Navigate().GoToUrl("the_targt_page_url_here");

    //maybe add selenium waits if needed, 
    //to wait until certain element appear in the page

    //pass the HTML page to HAP's HtmlDocument
    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(driver.PageSource);
}

Selenium also provides ways to locate elements within a page, so it is possible to replace HAP completely with Selenium, if you want.

Upvotes: 1

Related Questions