Reputation: 33
I have been using BeautifulSoup to parse an HTML document and seem to have run into a problem. I found some text that I need to extract, but the text is plain. There are no tags or anything. I am not sure if I need to use Regex instead in order to do this, because I do not know if I can grab the text with BeautifulSoup considering it does not contain any tags.
<strike style="color: #777777">975</strike> 487 RP<div class="gs-container default-2-col">
I am trying to extract the "487".
Thanks!
Upvotes: 1
Views: 2439
Reputation: 89285
You can use previous or next tag as anchor to find the text. For example, find <strike>
element first, and then get the text node next to it :
from bs4 import BeautifulSoup
html = """<strike style="color: #777777">975</strike> 487 RP<div class="gs-container default-2-col">"""
soup = BeautifulSoup(html)
#find <strike> element first, then get text element next to it
result = soup.find('strike',{'style': 'color: #777777'}).findNextSibling(text=True)
print(result.encode('utf-8'))
#output : ' 487 RP'
#you can then do simple text manipulation/regex to clean up the result
Note that above codes are for the sake of demo, not to accomplish your entire task.
Upvotes: 6