user1862963
user1862963

Reputation: 79

How to get innermost text of a tag with beautiful soup?

I have the following html element:

<blockquote class="abstract">
<span class="descriptor"> abstract</span>
Abstract text goes here
</blockquote>

I am interested in getting the "abstarct text...". I have tried the following approaches in python and beautifulsoup.

abstract=soup.find('blockquote', {"class":'abstract mathjax'})

the above gets to the correctly (I checked printing it). But none of the following suceeds at getting the text:

print abstract.text
print abstract.find(text=True)
print abstract.get_text()

Any clues? Thank you very much in advance,

Gabriel

Upvotes: 0

Views: 1180

Answers (1)

Martin Evans
Martin Evans

Reputation: 46759

You are trying to find both abstract and mathjax. Try the following:

from bs4 import BeautifulSoup

html = """<blockquote class="abstract">
<span class="descriptor"> abstract</span>
Abstract text goes here
</blockquote>"""    

soup = BeautifulSoup(html, "html.parser")
abstract = soup.find('blockquote', class_='abstract')
abstract.span.extract()   # Remove span element
print abstract.text

Which would print:

Abstract text goes here

Upvotes: 2

Related Questions