yadav
yadav

Reputation: 99

Extracting text from td tag containing br tags inside

I want to extract text from td tag containing br tags inside.

from bs4 import BeautifulSoup
html = "<td class=\"text\">This is <br/>a breakline<br/><br/></td>"
soup = BeautifulSoup(html, 'html.parser')
print(soup.td.string)

Actual Output: None

Expected output: This is a breakline

Upvotes: 1

Views: 610

Answers (2)

niraj
niraj

Reputation: 18208

From Beautiful Soup document:

If a tag contains more than one thing, then it’s not clear what .string should refer to, so .string is defined to be None:

And if you want text part (document):

If you only want the text part of a document or tag, you can use the get_text() method. It returns all the text in a document or beneath a tag, as a single Unicode string:

So you can use following:

print(soup.get_text())

For specific tag soup.td.get_text()

Upvotes: 2

Alexander Ejbekov
Alexander Ejbekov

Reputation: 5970

This will give you what you are looking for:

print(soup.td.text)

This is for the specific td tag

Otherwise you also have:

print(soup.text)

Upvotes: 0

Related Questions