Reputation: 1008
I'm building a webscraper that returns the names of cafes written in the website like this: <h2 class="venue-title" itemprop="name">Prior</h2>
However it is returning this error:
"ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()? [Finished in 0.699s]
Here is the code:
from bs4 import BeautifulSoup
import requests
url = 'https://www.broadsheet.com.au/melbourne/guides/best-cafes-thornbury'
response = requests.get(url, timeout=5)
soup_cafe_list = BeautifulSoup(response.content, "html.parser")
type(soup_cafe_list)
cafes = soup_cafe_list.findAll('h2', attrs_={"class":"venue-title"}).text
print(cafes)
I have tried a whole range of things to figure it out. I feel it has something to do with the findAll arg: cafes = soup_cafe_list.findAll('h2', attrs_={"class":"venue-title"}).text
because when I run it as cafes = soup_cafe_list.findAll('h2', class_="venue-title")
instead, it sort of works expect doesn't return the items cleaned of their html which I believe .text
should do?
Another thing I'm noticing in the traceback is that it may be referring to a different directory for BS4? Could this have anything to do with it, I started off using Jupyter and now am on Atom, but may have incorrectly installed bs4:
File "/Users/[xxxxxxxx]/Desktop/Coding/amvpscraper/webscraper.py", line 10, in cafes = soup_cafe_list.findAll('h2', attrs_={"class":"venue-title"}).text File "/Users/[xxxxxxxx]/opt/anaconda3/lib/python3.7/site-packages/bs4/element.py", line 2081, in getattr
Not sure if I am doing something else wrong...
Upvotes: 1
Views: 124
Reputation: 26
The error indicates that the return value of the findAll method is a list of elements and does not have a text attribute. Save the result in a list ( without .text method ) and replace attrs_ with attrs:
cafes = soup_cafe_list.findAll('h2', attrs={"class":"venue-title"})
and then iterate through list and get the text. You can do that with a list comprehension:
cafes = [el.text for el in cafes]
Edit: List comprehensions simplify a for loop. You could also write:
res_list = []
for el in cafes:
res_list.append(el.text)
Additionally, you may add some try-except clause or a check for a valid text field within the loop to catch possible elements without a text.
Output:
['Prior',
'Rat the Cafe',
'Ampersand Coffee and Food',
'Umberto Espresso Bar',
'Brother Alec',
'Short Round',
'Jerry Joy',
'The Old Milk Bar',
'Little Henri',
'Northern Soul']
Upvotes: 1