Reputation: 13
I'm learning the basics of web-scraping and am using Indeed as my testing ground.
I'm excluding sections of my code that I'm happy with to avoid a lengthy post. The "indeed(dot)com" portion of the print statement will be substituted with "site" so my post does not get auto-removed or flagged.
the related_jobs variable is of type bs4.element.tag. My code is as follows:
for job in jobs:
related_jobs = job.find('span', class_ = 'mat')
print(f"All Postings By {company_name}: site{related_jobs}")
Here is one of the outputs for the print statement:
My issue is as follows: I want to append the "site" variable to the first 'a' tag in the span tag and when I try and implement that this way:
related_jobs = job.find('span', class_ = 'mat').a['href']
, the output is exactly how I want it but does not continue after the first listing. I receive this error: "AttributeError: 'NoneType' object has no attribute 'a'".
My Question: Is there a way to have my for loop continue throughout the entire listing of the page? If not, is there a string method that I can use to grab the first a tag?
Upvotes: 0
Views: 406
Reputation: 21436
Some iterations in your loop find no jobs
. What you can do is either use try/except
statement or do an if check:
for job in jobs:
related_jobs = job.find('span', class_ = 'mat')
if not related_jobs:
# no jobs - skip this iteration
continue
related_jobs = related_jobs.a['href']
print(f"All Postings By {company_name}: site{related_jobs}")
or more pythonic approach is to use try/except
statement:
for job in jobs:
try:
related_jobs = job.find('span', class_ = 'mat').a['href']
except (AttributeError, KeyError):
continue
print(f"All Postings By {company_name}: site{related_jobs}")
Upvotes: 1