Reputation: 1453
I am using C# to crawl a website. All works fine except it can't detect dynamic JS links. As an example, a page with over 100 products may have few pages and the "Next Page"m "Prev Page" link may JS dynamic urls which is generated on click. Typical JS code is below:
<a href="javascript:PageURL('
cf-233--televisions.aspx','?',2);">></a>
Is there anyway of getting the actual link of the above href while collecting urls on the page ?
I am using Html Agility Pack but open to any other technology. I tried google this many times but seems no solution yet.
Thanks.
Upvotes: 2
Views: 1907
Reputation: 2364
AbotX allows you to render the javascript on the page. Its a powerful web crawler with advanced features.
Upvotes: 0
Reputation: 381
Have you tried to evaluate javascript to get actual hrefs? It might be helpful Parsing HTML to get script variable value
Or maybe you should check what PageURL function does (Just open the website with a browser and write at it's console PageURL without parentheses. It will show you code of the function) and rewrite it with C#
Upvotes: 1