Reputation: 99
I want to extract text from td tag containing br tags inside.
from bs4 import BeautifulSoup
html = "<td class=\"text\">This is <br/>a breakline<br/><br/></td>"
soup = BeautifulSoup(html, 'html.parser')
print(soup.td.string)
Actual Output: None
Expected output: This is a breakline
Upvotes: 1
Views: 610
Reputation: 18208
From Beautiful Soup document:
If a tag contains more than one thing, then it’s not clear what
.string
should refer to, so.string
is defined to be None:
And if you want text part (document):
If you only want the text part of a document or tag, you can use the
get_text()
method. It returns all the text in a document or beneath a tag, as a single Unicode string:
So you can use following:
print(soup.get_text())
For specific tag soup.td.get_text()
Upvotes: 2
Reputation: 5970
This will give you what you are looking for:
print(soup.td.text)
This is for the specific td
tag
Otherwise you also have:
print(soup.text)
Upvotes: 0