Imo
Imo

Reputation: 1475

AttributeError: 'ResultSet' object has no attribute 'find_all' Beautifulsoup

I dont understand why do i get this error:

I have a fairly simple function:

def scrape_a(url):
  r = requests.get(url)
  soup = BeautifulSoup(r.content)
  news =  soup.find_all("div", attrs={"class": "news"})
  for links in news:
    link = news.find_all("href")
    return link

Here is th estructure of webpage I am trying to scrape:

<div class="news">
<a href="www.link.com">
<h2 class="heading">
heading
</h2>
<div class="teaserImg">
<img alt="" border="0" height="124" src="/image">
</div>
<p> text </p>
</a>
</div>

Upvotes: 1

Views: 8619

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1122232

You are doing two things wrong:

  • You are calling find_all on the news result set; presumably you meant to call it on the links object, one element in that result set.

  • There are no <href ...> tags in your document, so searching with find_all('href') is not going to get you anything. You only have tags with an href attribute.

You could correct your code to:

def scrape_a(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    news =  soup.find_all("div", attrs={"class": "news"})
    for links in news:
        link = links.find_all(href=True)
        return link

to do what I think you tried to do.

I'd use a CSS selector:

def scrape_a(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    news_links = soup.select("div.news [href]")
    if news_links:
        return news_links[0]

If you wanted to return the value of the href attribute (the link itself), you need to extract that too, of course:

return news_links[0]['href']

If you needed all the link objects, and not the first, simply return news_links for the link objects, or use a list comprehension to extract the URLs:

return [link['href'] for link in news_links]

Upvotes: 5

Related Questions