Beautifulsoup and link with a hash #

Question

I'm using Beautifulsoup with Python. I try to get elements from a link containing a hash #. It's a pagination link, the part after the # is the page number.

It doesn't work, I understood the problem is because urllib2 can't handle this since the part of the URL after the # is for client side handling and is never send to the server.

So I checked the real URL using the network tab of the developer tools in Chrome and it gives me this :

http://www.myserver.com/modules/blocklayered/blocklayered-ajax.php?_=1486617675431&id_category_layered=24&layered_weight_slider=0_10&layered_price_slider=21_2991&orderby=position&orderway=desc&n=20&p=3

It looks like the server doesn't like this URL at all because it returns me a blank page containing only this weird result : {"filtersBlock":"

So my question is, is there a way to handle these kind of link with BeautifulSoup ?

Laurent · Accepted Answer

I found a way doing this using BeautifulSoup to crawl the DOM and Selenium to handle these links containing a #. Just passing the link containing the # to Selenium driver with driver.get("www.myserver.com/products#/page-2") works.

Beautifulsoup and link with a hash #

Answers (1)

Related Questions