Reputation: 43
I seem to get an error while I use a for
loop in my web scraping technique.
Here is my code for the app.py
file:
page_content = requests.get("http://books.toscrape.com/").content
parser = BookParser(page_content)
containers = parser.Content()
results = []
for container in containers:
name = container.getName()
link = container.getLink()
price = container.getPrice()
rating = container.getRating()
results.append({'name': name,
'link': link,
'price': price,
'rating': rating
})
print(results[4])
and this is the code for the function that is called:
class BookParser(object):
RATINGS = {
'One': 1,
'Two': 2,
'Three': 3,
'Four': 4,
'Five': 5
}
def __init__(self, page):
self.soup = BeautifulSoup(page, 'html.parser')
def Content(self):
return self.soup.find_all("li",attrs={"class": 'col-xs-6'})
def getName(self):
return self.soup.find('h3').find('a')['title']
def getLink(self):
return self.soup.find('h3').find('a')['href']
def getPrice(self):
locator = BookLocator.PRICE
price = self.soup.select_one(locator).string
pattern = r"[0-9\.]*"
validator = re.findall(pattern, price)
return float(validator[1])
def getRating(self):
locator = BookLocator.STAR_RATING
rating = self.soup.select_one(locator).attrs['class']
rating_number = BookParser.RATINGS.get(rating[1])
return rating_number
and finally, this is the error:
Traceback (most recent call last):
File "c:\Users\Utkarsh Kumar\Documents\Projects\milestoneP4\app.py", line 13, in <module>
name = container.getName()
TypeError: 'NoneType' object is not callable
I don't seem to understand why is the getName()
function returning a None Type.
Any help will be highly appreciated as I am pretty new to web scraping
PS: Using it without the for loop just works fine
something like this:
name = parser.getName()
print(name)
Upvotes: 0
Views: 64
Reputation: 11091
containers = parser.Content()
gives you a list of BS4 elements, not a BookParser
instance. You can verify this using print(type(containers))
.
To continue using .getName()
, you can create a new class called Book
, move .getName
and move all related methods to it and pass in a list item returned from .Content()
method (i.e. li.col-xs-6
) and then you can call book.getName()
Something like this should work:
class Book:
def __init__(el):
self.soup = el
def getName(self):
return self.soup.find('h3').find('a')['title']
def getLink(self):
...
def getPrice(self):
...
def getRating(self):
...
def get_books(html: str) -> list:
soup = BeautifulSoup(html, 'html.parser')
return [Book(it) for it in soup.find_all("li",attrs={"class": 'col-xs-6'})]
for b in get_books(html):
print(b.getName())
Upvotes: 2
Reputation: 1180
Each book in the list is in these li elements:
<li class="col-xs-6 col-sm-4 col-md-3 col-lg-3">
<article class="product_pod">
<div class="image_container">
<a href="catalogue/a-light-in-the-attic_1000/index.html"><img src="media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg" alt="A Light in the Attic" class="thumbnail"></a>
</div>
<p class="star-rating Three">
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
</p>
<h3><a href="catalogue/a-light-in-the-attic_1000/index.html" title="A Light in the Attic">A Light in the ...</a></h3>
<div class="product_price">
<p class="price_color">£51.77</p>
<p class="instock availability">
<i class="icon-ok"></i>
In stock
</p>
<form>
<button type="submit" class="btn btn-primary btn-block" data-loading-text="Adding...">Add to basket</button>
</form>
</div>
</article>
</li>
Sorry for the bad formatting but you get the point. Make a class that operates on on a single list element rather than the soup object which is your whole page. For example:
class BookParser:
def __init__(self, book_item ):
self.book_item = book_item
def getName( self ):
return self.book_item.find( path_to_name ).text
Then, you would first parse the page, find all the
soup = BeautifulSoup( url )
soup.find_all( path_to_book_elements )
books = []
for be in book_elements:
books.append( BookParser( be ))
books[0].getName() # A light in the Attic
books[1].getName() # Tripping on Velvet
Upvotes: 1