Laurent
Laurent

Reputation: 1749

Beautifulsoup and link with a hash #

I'm using Beautifulsoup with Python. I try to get elements from a link containing a hash #. It's a pagination link, the part after the # is the page number.

It doesn't work, I understood the problem is because urllib2 can't handle this since the part of the URL after the # is for client side handling and is never send to the server.

So I checked the real URL using the network tab of the developer tools in Chrome and it gives me this :

http://www.myserver.com/modules/blocklayered/blocklayered-ajax.php?_=1486617675431&id_category_layered=24&layered_weight_slider=0_10&layered_price_slider=21_2991&orderby=position&orderway=desc&n=20&p=3

It looks like the server doesn't like this URL at all because it returns me a blank page containing only this weird result : {"filtersBlock":"\n\n

So my question is, is there a way to handle these kind of link with BeautifulSoup ?

Upvotes: 1

Views: 916

Answers (1)

Laurent
Laurent

Reputation: 1749

I found a way doing this using BeautifulSoup to crawl the DOM and Selenium to handle these links containing a #. Just passing the link containing the # to Selenium driver with driver.get("www.myserver.com/products#/page-2") works.

Upvotes: 1

Related Questions