BenW
BenW

Reputation: 1453

Crawl a webpage and grab all dynamic Javascript links

I am using C# to crawl a website. All works fine except it can't detect dynamic JS links. As an example, a page with over 100 products may have few pages and the "Next Page"m "Prev Page" link may JS dynamic urls which is generated on click. Typical JS code is below:

<a href="javascript:PageURL('
        cf-233--televisions.aspx','?',2);">&gt;</a>

Is there anyway of getting the actual link of the above href while collecting urls on the page ?

I am using Html Agility Pack but open to any other technology. I tried google this many times but seems no solution yet.

Thanks.

Upvotes: 2

Views: 1907

Answers (2)

sjdirect
sjdirect

Reputation: 2364

AbotX allows you to render the javascript on the page. Its a powerful web crawler with advanced features.

Upvotes: 0

Isk1n
Isk1n

Reputation: 381

Have you tried to evaluate javascript to get actual hrefs? It might be helpful Parsing HTML to get script variable value

Or maybe you should check what PageURL function does (Just open the website with a browser and write at it's console PageURL without parentheses. It will show you code of the function) and rewrite it with C#

Upvotes: 1

Related Questions