MITHU
MITHU

Reputation: 154

Script gives me differnt results than the site displays

I've created a python script to fetch the number of results the site displays. I've tried with two links and they both give me the different result count than what I see in the browser. However, the results that I expect are available in the page source, so requests is supposed to fetch the exact results.

I've written so far:

import requests
from bs4 import BeautifulSoup

links = [
    'https://www.zillow.com/homes/Houston,-MN_rb/',
    'https://www.zillow.com/homes/Houston,-TX_rb/'
]

for link in links:
    res = requests.get(link,headers={"User-Agent":"Mozilla/5.0"})
    soup = BeautifulSoup(res.text,"lxml")
    total_result = soup.select_one(".result-count").text
    print(total_result)

Result I'm getting:

18 results
17,575 results

Expected output that I can see when I browse manually using chrome:

10 results
8,345 results

How can I get the exact result that the site displays?

Upvotes: 0

Views: 111

Answers (1)

PauZen
PauZen

Reputation: 102

You can check the following stack question Python requests isn't giving me the same HTML as my browser is.

Even if the question is old, it seems to stay true (according to my test which is not the Truth :D ). To be honest i have no idea why and would be interesting to dig into the code to understand.

Here is a code with urllib which do what you want and consistently send the same result than browsing.

import requests
from bs4 import BeautifulSoup
import urllib.request

links = [
    'https://www.zillow.com/homes/Houston,-MN_rb/',
    'https://www.zillow.com/homes/Houston,-TX_rb/'
]

headers={"User-Agent":"Mozilla/5.0"}

for link in links:
    req = urllib.request.Request(link, None, headers)
    res=urllib.request.urlopen(req)
    #res = requests.get(link,headers={"User-Agent":"Mozilla/5.0"})
    html=res.read()
    soup = BeautifulSoup(html,"lxml")
    total_result = soup.select_one(".result-count").text
    print(total_result)

Hope it helps, and have a good day.

Upvotes: 1

Related Questions