Reputation: 1982
I'm reading a web site content using following 3 liners. I used an example domain for sale which doesn't have many content.
url = "http://localbusiness.com/"
response = requests.get(url)
html = response.text
It returns following html content where the website contains more html when you check through view source. Am I doing something wrong here
Python version 2.7
<html><head></head><body><!-- vbe --></body></html>
Upvotes: 2
Views: 15776
Reputation: 291
@jason answered it correctly so I am extending his answer for the reason
Why It happens
Other alternatives
You can use the mechanize module of python to mimic a browser to fool a web site (come handy when the site is using some short of authentication cookies) A small tutorial
Use selenium to actually implement a browser
Upvotes: 1
Reputation: 12613
Try setting a User-Agent
:
import requests
url = "http://localbusiness.com/"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36',
'Content-Type': 'text/html',
}
response = requests.get(url, headers=headers)
html = response.text
The default User-Agent
set by requests is 'User-Agent': 'python-requests/2.8.1'
. Try to simulate that the request is coming from a browser and not a script.
Upvotes: 6