Reputation: 79
I have the following html element:
<blockquote class="abstract">
<span class="descriptor"> abstract</span>
Abstract text goes here
</blockquote>
I am interested in getting the "abstarct text...". I have tried the following approaches in python and beautifulsoup.
abstract=soup.find('blockquote', {"class":'abstract mathjax'})
the above gets to the correctly (I checked printing it). But none of the following suceeds at getting the text:
print abstract.text
print abstract.find(text=True)
print abstract.get_text()
Any clues? Thank you very much in advance,
Gabriel
Upvotes: 0
Views: 1180
Reputation: 46759
You are trying to find both abstract
and mathjax
. Try the following:
from bs4 import BeautifulSoup
html = """<blockquote class="abstract">
<span class="descriptor"> abstract</span>
Abstract text goes here
</blockquote>"""
soup = BeautifulSoup(html, "html.parser")
abstract = soup.find('blockquote', class_='abstract')
abstract.span.extract() # Remove span element
print abstract.text
Which would print:
Abstract text goes here
Upvotes: 2