Getting text without tags using BeautifulSoup?

Question

I have been using BeautifulSoup to parse an HTML document and seem to have run into a problem. I found some text that I need to extract, but the text is plain. There are no tags or anything. I am not sure if I need to use Regex instead in order to do this, because I do not know if I can grab the text with BeautifulSoup considering it does not contain any tags.

975 487 RP

I am trying to extract the "487".

Thanks!

har07 · Accepted Answer

You can use previous or next tag as anchor to find the text. For example, find ~~element first, and then get the text node next to it :~~

from bs4 import BeautifulSoup html = """975 487 RP
""" soup = BeautifulSoup(html) #find element first, then get text element next to it result = soup.find('strike',{'style': 'color: #777777'}).findNextSibling(text=True) print(result.encode('utf-8')) #output : ' 487 RP' #you can then do simple text manipulation/regex to clean up the result

~~_{Note that above codes are for the sake of demo, not to accomplish your entire task.}~~

Getting text without tags using BeautifulSoup?

Answers (1)

Related Questions