codsane
codsane

Reputation: 33

Getting text without tags using BeautifulSoup?

I have been using BeautifulSoup to parse an HTML document and seem to have run into a problem. I found some text that I need to extract, but the text is plain. There are no tags or anything. I am not sure if I need to use Regex instead in order to do this, because I do not know if I can grab the text with BeautifulSoup considering it does not contain any tags.

<strike style="color: #777777">975</strike> 487 RP<div class="gs-container default-2-col">

I am trying to extract the "487".

Thanks!

Upvotes: 1

Views: 2439

Answers (1)

har07
har07

Reputation: 89285

You can use previous or next tag as anchor to find the text. For example, find <strike> element first, and then get the text node next to it :

from bs4 import BeautifulSoup

html = """<strike style="color: #777777">975</strike> 487 RP<div class="gs-container default-2-col">"""
soup = BeautifulSoup(html)

#find <strike> element first, then get text element next to it
result = soup.find('strike',{'style': 'color: #777777'}).findNextSibling(text=True)

print(result.encode('utf-8'))
#output : ' 487 RP' 
#you can then do simple text manipulation/regex to clean up the result

Note that above codes are for the sake of demo, not to accomplish your entire task.

Upvotes: 6

Related Questions