Carbonemys
Carbonemys

Reputation: 13

Python Beautifulsoup4 remove <span> tags

I am scraping information from a website using this line
offers = soup.find_all("span", "rcnt")
Which gives me this result:
[<span class="rcnt">8.668</span>]
And for some reason when I tried to unwrap it it gave me this
[<span class="rcnt"></span>]
Instead of 8.668

How do I code this correctly

Upvotes: 0

Views: 2067

Answers (3)

sulav_lfc
sulav_lfc

Reputation: 782

Just use .string() to retrieve the value inside any html tag.

html = '<span class="rcnt">8.668</span>'
soup = BeautifulSoup(html)
offers = soup.('span',attrs={"class":"rcnt"})

It returns an array of all the span tag.Now you can use .string() function to retrieve the string part within the span tag as:

for i in range(0,len(offers)):
 print offers[i]

Upvotes: 0

shaktimaan
shaktimaan

Reputation: 12092

It is not clear from your description as to what code you are using to get(unwrap) the content. Here is what you do.

offers is a list. To get the content within the span elements you do:

elements = [tag.text for tag in offers]

elements will have the contents of all of the span tags in your HTML.

>>> html = '<span class="rcnt">8.668</span><span class="rcnt">5.7868</span>'
>>> soup = BeautifulSoup(html)
>>> offers =  soup.find_all("span", "rcnt")
>>> elements = [tag.text for tag in offers]
>>> elements
[u'8.668', u'5.7868']

Upvotes: 0

Sabuj Hassan
Sabuj Hassan

Reputation: 39355

Use .string or .renderContents() to get the value.

htmls = '<span class="rcnt">8.668</span>'
soup = BeautifulSoup(htmls)
offers =  soup.find_all("span", "rcnt")

print offers[0].string           ## this one is better
print offers[0].renderContents()

Upvotes: 2

Related Questions