Reputation: 21
I am trying to scrape a website with ?, =, #
in the URL. When I do I get redirected. I think I have narrowed the problem character to the #
. I think it is trying to percent encode the #
. In my case it is not an anchor, its sorting items across many pages
EDIT: I think it is requests that is causing the error, and that the # is typically a client only parameter, that is not being sent to the server
Working URLs
www.foo.com/
www.foo.com/example
www.foo.com/example/search?q=&%5B%5D
Bad URLs (all pull up the same non-erroring page, even in browser)
www.foo.com/example/#page1
www.foo.com/example/%23page1 (percent encoded #)
www.foo.com/example/something_that_does_not_exsit
response = requests.get(r"www.foo.com/example/#page1")
response.url
soup = BeautifulSoup(response.text, 'html.parser')
Upvotes: 0
Views: 145
Reputation: 536
Have you checked this out? Might be useful, look into Selenium.
Upvotes: 1