user3404005
user3404005

Reputation: 187

How to remove html tags from strings in Python using BeautifulSoup

programming newbie here :)

I'd like to print the prices from the website using BeautifulSoup. this is my code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-


from bs4 import BeautifulSoup, SoupStrainer
from urllib2 import urlopen

url = "Some retailer's url"
html = urlopen(url).read()
product = SoupStrainer('span',{'style': 'color:red;'})
soup = BeautifulSoup(html, parse_only=product)
print soup.prettify()

and it prints prices in the following order:

<span style="color:red;">
 180
</span>
<span style="color:red;">
 1250
</span>
<span style="color:red;">
 380
</span>

I tried print soup.text.strip() but it returned 1801250380

Please help me to print the prices per single row :)

Many thanks!

Upvotes: 3

Views: 3491

Answers (2)

Steinar Lima
Steinar Lima

Reputation: 7821

This will get you a list of strings converted to integers:

>>> [int(span.text) for span in soup.find_all('span')]
[180, 1250, 380]

Upvotes: 2

jfs
jfs

Reputation: 414079

>>> print "\n".join([p.get_text(strip=True) for p in soup.find_all(product)])
180
1250
380

Upvotes: 2

Related Questions