Hary2
Hary2

Reputation: 61

headers in Zillow website - where to get it

The code below extracts data from Zillow Sale.

My 1st question is where people get the headers information.

My 2nd question is how do I know when I needs headers? For some other page like Cars.com, I don't need put headers=headers and I can still get data correctly.

Thank you for your help. HHC

import requests
from bs4 import BeautifulSoup
import re

url ='https://www.zillow.com/baltimore-md-21201/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%2221201%22%2C%22mapBounds%22%3A%7B%22west%22%3A-76.67377295275878%2C%22east%22%3A-76.5733510472412%2C%22south%22%3A39.26716345016057%2C%22north%22%3A39.32309233550334%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A66811%2C%22regionType%22%3A7%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A14%7D'

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36',
  'referer': 'https://www.zillow.com/new-york-ny/rentals/2_p/?searchQueryState=%7B%22pagination'
}

raw_page = requests.get(url, headers=headers)
status = raw_page.status_code
print(status)
    
# Loading the page content into the beautiful soup
page = raw_page.content

page_soup = BeautifulSoup(page, 'html.parser')
print(page_soup) 

Upvotes: 0

Views: 672

Answers (2)

Salim muneer lala
Salim muneer lala

Reputation: 99

from your browser go to this website: http://myhttpheader.com/ you will find headers info there.

Secondly, whenever some website like zillow blocks you from scraping data, only then we need to provide headers.

Check this picture: enter image description here

Upvotes: 0

Musa
Musa

Reputation: 97717

You can get headers from going to the site with your browser and using the network tab of the developer tools in there, select a request and you can headers sent in requests.

Some websites don't serve bots, so to make them think you're not a bot you set the user agent header to one a browser uses, some sites may require more headers for you to pass the not a bot test. You can see all the headers being sent in developer tools, you can test different headers until your request succeeds.

Upvotes: 1

Related Questions