Merithor
Merithor

Reputation: 70

Find text but skip other elements

I currently try to get the text out of a 'td' element but there are more elements inside it. so the find() returns me the whole text inside the td tag. Here's the Code:

<td class="some class">
  Some text that i want<br>
  <a href="some/link">some more text</a>       
  <span class="some other class">some more text</span>
  <br>
</td>

So what what i want is only the next right after the td tag. I am using BeautifulSoup.

Any suggestions how to get the text without the other elements?

Upvotes: 0

Views: 150

Answers (3)

alecxe
alecxe

Reputation: 473903

A more common way to get the "Some text that i want" would be to use find(text=True), which would find the first text node inside a tag:

from bs4 import BeautifulSoup

data = """<td class="some class">
  Some text that i want<br>
  <a href="some/link">some more text</a>
  <span class="some other class">some more text</span>
  <br>
</td>"""

soup = BeautifulSoup(data, "html.parser")
text = soup.find("td", class_="some class").find(text=True)
print(text.strip())  # prints "Some text that i want"

Another option would be to get the text node from the .stripped_strings which contains all the text node (additionally trimmed/stripped) inside a tag:

next(soup.find("td", class_="some class").stripped_strings)

Upvotes: 0

H. Lewroll
H. Lewroll

Reputation: 106

For the first text only you can get the 'td' class, convert it into a list and get the first index:

t ='''
<td class="some class">
  Some text that i want<br>
  <a href="some/link">some more text</a>       
  <span class="some other class">some more text</span>
  <br>
</td>
'''

soup = BeautifulSoup(t, "html.parser")

text = list(soup.find('td'))[0]

Upvotes: 1

Abdul Fatir
Abdul Fatir

Reputation: 6357

Simply use .text on that element.

b=bs4.BeautifulSoup("""<td class="some class">
Some text that i want<br>
<a href="some/link">some more text</a>
<span class="some other class">some more text</span>
<br>
</td>""")
txt = b.find('td').text
# txt will be: u'\n  Some text that i want\nsome more text\nsome more text\n\n'

Upvotes: 0

Related Questions