Reputation: 47
super new to programming so sorry for any bad practices:
I was trying to make a web scraper that would scrape indeed.com for job listings in my field, and was following some articles on it online and I thought I understood it but now I think I've got a misunderstanding.
I'm attempting to scrape the location of the job which I found in the html as follows: html code
In order to scrape that location I was told to do as follows:
grabbing location name
c = div.find_all(name="span",attrs={"class":"location"})
for span in c:
print(span.text)
job_post.append(span.text)
However I'm noting that sometimes the webpage loads it under div, not span, so I edited the code as follows:
def find_location_for_job(self,div,job_post,city):
div2 = div.find_all(name="div",attrs={"class":"sjcl"})
print(div2)
try:
div3 = div2.find_all(name="div",attrs={"class":"location accessible-contrast-color-location"})
job_post.append(div3.text)
except:
span = div2.find_all(name="span",attrs={"class":"location accessible-contrast-color-location"})
job_post.append(span.text)
print(job_post)
However, half of the time it's still saying it can't find the text in the div/span, even when I search the posting and see it labeled as one or the other.
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
Note that I left the code I had found because it doesn't capture the results when div is used instead of span. So my next troubleshooting step was to sorta combine my thoughts with theirs, which is as follows:
def find_location_for_job(self,div,job_post,city):
div2 = div.find_all(name="div",attrs={"class":"sjcl"})
try:
div3 = div2.find_all(name="div",attrs={"class":"location accessible-contrast-color-location"})
for span in div3:
job_post.append(span.text)
except:
div4 = div.findAll("span",attrs={"class":"location accessible-contrast-color-location"})
for span in div4:
job_post.append(span.text)
However this method throws the entire list of locations into every entry it scrapes (it scrapes 10 posting per city, so this method throws 10 locations into each of the 10 posting entries)
Can anyone tell me where I'm having the brain fart?
Edit: Full code in pastebin: https://pastebin.com/0LLb9ZcU
Upvotes: 1
Views: 104
Reputation: 1607
div2
is a ResultSet
because when you use BeautifulSoup's find_all
method that's what it returns. You need to iterate over the ResultSet
and search for the inner fields like so:
def find_location_for_job(self, div, job_post, city):
div2 = div.find_all(name="div",attrs={"class":"sjcl"})
for sjcl_div in div2:
div3 = div2.find_all(name="div",attrs={"class":"location accessible-contrast-color-location"})
div4 = div.find_all("span",attrs={"class":"location accessible-contrast-color-location"})
if div3:
for span in div3:
job_post.append(span.text)
elif div4:
for span in div4:
job_post.append(span.text)
else:
print("Uh-oh, couldn't find the tags!")
Upvotes: 2