Reputation: 1749
I'm using Beautifulsoup with Python. I try to get elements from a link containing a hash #. It's a pagination link, the part after the # is the page number.
It doesn't work, I understood the problem is because urllib2 can't handle this since the part of the URL after the # is for client side handling and is never send to the server.
So I checked the real URL using the network tab of the developer tools in Chrome and it gives me this :
It looks like the server doesn't like this URL at all because it returns me a blank page containing only this weird result : {"filtersBlock":"\n\n
So my question is, is there a way to handle these kind of link with BeautifulSoup ?
Upvotes: 1
Views: 916
Reputation: 1749
I found a way doing this using BeautifulSoup to crawl the DOM and Selenium to handle these links containing a #. Just passing the link containing the # to Selenium driver with driver.get("www.myserver.com/products#/page-2")
works.
Upvotes: 1