TheIrishPizzaGuy
TheIrishPizzaGuy

Reputation: 21

beautiful soup url with # in it

I am trying to scrape a website with ?, =, # in the URL. When I do I get redirected. I think I have narrowed the problem character to the #. I think it is trying to percent encode the #. In my case it is not an anchor, its sorting items across many pages EDIT: I think it is requests that is causing the error, and that the # is typically a client only parameter, that is not being sent to the server

Working URLs

www.foo.com/

www.foo.com/example

www.foo.com/example/search?q=&%5B%5D

Bad URLs (all pull up the same non-erroring page, even in browser)

www.foo.com/example/#page1

www.foo.com/example/%23page1 (percent encoded #)

www.foo.com/example/something_that_does_not_exsit

response = requests.get(r"www.foo.com/example/#page1")
response.url
soup = BeautifulSoup(response.text, 'html.parser')

Upvotes: 0

Views: 145

Answers (1)

mallocation
mallocation

Reputation: 536

Have you checked this out? Might be useful, look into Selenium.

Beautifulsoup and link with a hash #

Upvotes: 1

Related Questions