BrianM
BrianM

Reputation: 43

python beautiful soup meta content tag

I'm trying to extract the price from a web site that includes the following HTML:

<div class="book-block-price " itemprop="offers" itemtype="http://schema.org/Offer" itemscope>
<meta itemprop="price" content="29.99"/>
<meta itemprop="price" content=""/>
    $ 29.99         </div>

I'm using the following Beautiful Soup code:

book_prices = soup_packtpage.find_all(class_="book-block-price ")
print(book_prices)
for book_price in book_prices:
    printable_version_price = book_price.meta.string
    print(printable_version_price)

print(book_prices) yields:

[<div class="book-block-price " itemprop="offers" itemscope=""    itemtype="http://schema.org/Offer">
<meta content="29.99" itemprop="price"/>
<meta content="" itemprop="price"/>
            $ 29.99     

print(printable_version_price) yields "None".

How do I deal with meta tags? Or do I have other problems?

Upvotes: 1

Views: 895

Answers (2)

chown
chown

Reputation: 52748

You could probably do it with lxml's etree (pseudo-code, but should be enough to get you going):

from lxml import etree
doc = etree.parse(x) # where x is a file-like object, or parseString if x is a string.
print doc.xpath('//meta[itemprop="price"]/text()')

Upvotes: 0

alecxe
alecxe

Reputation: 473893

The book_price.meta would match the first meta tag inside the book price block. And this first meta tag text is "empty" - this is why you are getting an empty string printed:

<meta itemprop="price" content="29.99"/>

Instead, get the content attribute value:

book_price.meta["content"]

Upvotes: 4

Related Questions