Reputation: 333
I realize this is probably incredibly straightforward but please bear with me. I'm trying to use beautifulsoup 4 to scrape a website that has a list of blog posts for the urls of those posts. The tag that I want is within an tag. There are multiple tags that include a header and then a link that I want to capture. This is the code I'm working with:
with io.open('TPNurls.txt', 'a', encoding='utf8') as logfile:
snippet = soup.find_all('p', class="postbody")
for link in snippet.find('a'):
fulllink = link.get('href')
logfile.write(fulllink + "\n")
The error I'm getting is:
AttributeError: 'ResultSet' object has no attribute 'find'
I understand that means "head" is a set and beautifulsoup doesn't let me look for tags within a set. But then how can I do this? I need it to find the entire set of tags and then look for the tag within each one and then save each one on a separate line to a file.
Upvotes: 2
Views: 552
Reputation: 9657
In your code,
snippet = soup.find_all('p', class="postbody")
for link in snippet.find('a'):
Here snippet
is a bs4.element.ResultSet
type object. So you are getting this error. But the elements of this ResultSet
object are bs4.element.Tag
type where you can apply find
method.
Change your code like this,
snippet = soup.find_all("p", { "class" : "postbody" })
for link in snippet:
if link.find('a'):
fulllink = link.a['href']
logfile.write(fulllink + "\n")
Upvotes: 0
Reputation: 474121
The actual reason for the error is that snippet
is a result of find_all()
call and is basically a list of results, there is no find()
function available on it. Instead, you meant:
snippet = soup.find('p', class_="postbody")
for link in snippet.find_all('a'):
fulllink = link.get('href')
logfile.write(fulllink + "\n")
Also, note the use of class_
here - class
is a reserved keyword and cannot be used as a keyword argument here. See Searching by CSS class for more info.
Alternatively, make use of CSS selectors
:
for link in snippet.select('p.postbody a'):
fulllink = link.get('href')
logfile.write(fulllink + "\n")
p.postbody a
would match all a
tags inside the p
tag with class postbody
.
Upvotes: 4