Reputation: 43
I'm trying to extract the price from a web site that includes the following HTML:
<div class="book-block-price " itemprop="offers" itemtype="http://schema.org/Offer" itemscope>
<meta itemprop="price" content="29.99"/>
<meta itemprop="price" content=""/>
$ 29.99 </div>
I'm using the following Beautiful Soup code:
book_prices = soup_packtpage.find_all(class_="book-block-price ")
print(book_prices)
for book_price in book_prices:
printable_version_price = book_price.meta.string
print(printable_version_price)
print(book_prices) yields:
[<div class="book-block-price " itemprop="offers" itemscope="" itemtype="http://schema.org/Offer">
<meta content="29.99" itemprop="price"/>
<meta content="" itemprop="price"/>
$ 29.99
print(printable_version_price) yields "None".
How do I deal with meta tags? Or do I have other problems?
Upvotes: 1
Views: 895
Reputation: 52748
You could probably do it with lxml
's etree
(pseudo-code, but should be enough to get you going):
from lxml import etree
doc = etree.parse(x) # where x is a file-like object, or parseString if x is a string.
print doc.xpath('//meta[itemprop="price"]/text()')
Upvotes: 0
Reputation: 473893
The book_price.meta
would match the first meta
tag inside the book price block. And this first meta
tag text is "empty" - this is why you are getting an empty string printed:
<meta itemprop="price" content="29.99"/>
Instead, get the content
attribute value:
book_price.meta["content"]
Upvotes: 4