Python requests web scraping how to detect non existent returned pages?

Question

I'm scraping some average salary data to make infographics from a list of jobs. If the job can be found, like "programmer", then it gives me a code 200 and the page I go to is the same in the script.

import requests

job_url: str = "https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Programmer-Salary-by-State"
job_response = requests.get(job_url, timeout=10)
print(job_response)

If it fails like below for "Youtuber", I want to display an error message to the user. But, I still get a code 200. Manually trying this, their site redirects me to a page like "https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Youtuber-Salary-by-State?ind=null"

null_url: str = "https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Youtuber-Salary-by-State"
null_response = requests.get(null_url, timeout=10)

How can I in code figure out if the query is redirecting to an empty page? Do I need to use another library?

Sers · Accepted Answer

You can disable redirection and check the response:

null_url = "https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Youtuber-Salary-by-State"
null_response = requests.get(null_url, timeout=10, allow_redirects=False)

if null_response.status_code == 301:
    print("Not found")

if "Moved Permanently" in null_response.text:
    print("Not found")

if "ind=null" in null_response.next.url:
    print("Not found")

Or with redirections:

null_url = "https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Youtuber-Salary-by-State"
null_response = requests.get(null_url, timeout=10)
if "ind=null" in null_response.url:
    print("Not found")

if null_response.history[0].status_code == 301:
    print("Not found")

Python requests web scraping how to detect non existent returned pages?

Answers (1)

Related Questions