Reputation: 245
I came across a situation when I used Python Requests or urllib2 to open urls. I got 404 'page not found' responses. For example, url = 'https://www.facebook.com/mojombo'. However, I can copy and paste those urls in browser and visit them. Why does this happen?
I need to get some content from those pages' html source code. Since I can't open those urls using Requests or urllib2, I can't use BeautifulSoup to extract element from html source code. Is there a way to get those page's source code and extract content form it utilizing Python?
Although this is a general question, I still need some working code to solve it. Thanks!
Upvotes: 1
Views: 456
Reputation: 26
It looks like your browser is using cookies to log you in. Try opening that url in a private or incognito tab, and you'll probably not be able to access it.
However, if you are using Requests, you can pass the appropriate login information as a dictionary of values. You'll need to check the form information to see what the fields are, but Requests can handle that as well.
The normal format would be:
payload = {
'username': 'your username',
'password': 'your password'
}
p = requests.post(myurl, data=payload)
with more or less fields added as needed.
Upvotes: 1