Reputation: 25
I'm web scraping the Monster job site with the search aimed at "Software Developer" and my aim is to simply print out only the jobs that have "python" listed in their description in the Python terminal, while discarding all the other jobs for Java, HTML, CSS etc. However when I run this code I end up printing all the jobs on the page. To solve this I created a variable (called 'search') that searches for all jobs with 'python' and converts it to lowercase. Also I created a variable (called 'python_jobs') that includes all the job listings on the page.
Then I made a "for" loop that looks for every instance where 'search' is found in 'python_jobs'. However this gives the same result as before and prints out every job listing on the page anyways. Any suggestions?
import requests
from bs4 import BeautifulSoup
URL = "https://www.monster.com/jobs/search/?q=Software-Developer"
page = requests.get(URL)
print(page)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="ResultsContainer")
search = results.find_all("h2", string=lambda text: "python" in text.lower())
python_jobs = results.find_all("section", class_="card-content")
print(len(search))
for search in python_jobs:
title = search.find("h2", class_="title")
company = search.find("div", class_="company")
if None in (title, company):
continue
print(title.text.strip())
print(company.text.strip())
print()
Upvotes: 0
Views: 324
Reputation: 142671
Your problem is you have two separated list search
and python_jobs
which are not related. And later you don't even use list search
. You should rather get every item from python_jobs
and search python
inside this item.
import requests
from bs4 import BeautifulSoup
URL = "https://www.monster.com/jobs/search/?q=Software-Developer"
page = requests.get(URL)
print(page)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="ResultsContainer")
all_jobs = results.find_all("section", class_="card-content")
for job in all_jobs:
python = job.find("h2", string=lambda text: "python" in text.lower())
if python:
title = job.find("h2", class_="title")
company = job.find("div", class_="company")
print(title.text.strip())
print(company.text.strip())
print()
or
import requests
from bs4 import BeautifulSoup
URL = "https://www.monster.com/jobs/search/?q=Software-Developer"
page = requests.get(URL)
print(page)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="ResultsContainer")
all_jobs = results.find_all("section", class_="card-content")
for job in all_jobs:
title = job.find("h2")
if title:
title = title.text.strip()
if 'python' in title.lower():
company = job.find("div", class_="company").text.strip()
print(title)
print(company)
print()
Upvotes: 1