Shengjie Zhang
Shengjie Zhang

Reputation: 245

Python urllib2 response 404 error but url can be opened

I came across a situation when I used Python Requests or urllib2 to open urls. I got 404 'page not found' responses. For example, url = 'https://www.facebook.com/mojombo'. However, I can copy and paste those urls in browser and visit them. Why does this happen?

I need to get some content from those pages' html source code. Since I can't open those urls using Requests or urllib2, I can't use BeautifulSoup to extract element from html source code. Is there a way to get those page's source code and extract content form it utilizing Python?

Although this is a general question, I still need some working code to solve it. Thanks!

Upvotes: 1

Views: 456

Answers (1)

HansGroober
HansGroober

Reputation: 26

It looks like your browser is using cookies to log you in. Try opening that url in a private or incognito tab, and you'll probably not be able to access it.
However, if you are using Requests, you can pass the appropriate login information as a dictionary of values. You'll need to check the form information to see what the fields are, but Requests can handle that as well. The normal format would be:

payload = {
   'username': 'your username',
   'password': 'your password'
   }
p = requests.post(myurl, data=payload)

with more or less fields added as needed.

Upvotes: 1

Related Questions