rclakmal
rclakmal

Reputation: 1982

Python getting HTML content via 'requests' returns partial response

I'm reading a web site content using following 3 liners. I used an example domain for sale which doesn't have many content.

url = "http://localbusiness.com/"
response = requests.get(url)
html = response.text

It returns following html content where the website contains more html when you check through view source. Am I doing something wrong here

Python version 2.7

<html><head></head><body><!-- vbe --></body></html>

Upvotes: 2

Views: 15776

Answers (2)

Satyapal Sharma
Satyapal Sharma

Reputation: 291

@jason answered it correctly so I am extending his answer for the reason

Why It happens

  1. Some DOM elements code changed through the Ajax calls and JavaScript code so that will not be seen in the response of your call (Although it's not the case here as you are already using the view source (ctrl+u) to compare and not view element)
  2. Some sites uses user-agent to know the nature of user (as of desktop or mobile user) and provide the response accordingly (as the probable case here)

Other alternatives

  1. You can use the mechanize module of python to mimic a browser to fool a web site (come handy when the site is using some short of authentication cookies) A small tutorial

  2. Use selenium to actually implement a browser

Upvotes: 1

JRodDynamite
JRodDynamite

Reputation: 12613

Try setting a User-Agent:

import requests

url = "http://localbusiness.com/"

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36',
    'Content-Type': 'text/html',
}

response = requests.get(url, headers=headers)
html = response.text

The default User-Agent set by requests is 'User-Agent': 'python-requests/2.8.1'. Try to simulate that the request is coming from a browser and not a script.

Upvotes: 6

Related Questions