Reputation: 59933
Beautifulsoup is handy for html parsing in python, but I meet problem to have clean code to get the value directly using string
or text
from bs4 import BeautifulSoup
tr ="""
<table>
<tr><td>text1</td></tr>
<tr><td>text2<div>abc</div></td></tr>
</table>
"""
table = BeautifulSoup(tr,"html.parser")
for row in table.findAll("tr"):
td = row.findAll("td")
print td[0].text
print td[0].string
result:
text1
text1
text2abc
None
How can I get the result for
text1
text2
I want to skip the extra inner tag
beautifulsoup4-4.5.0
is used with python 2.7
Upvotes: 1
Views: 1270
Reputation: 12613
You could simply use the .find()
function by setting the text
and recursive
argument.
for row in table.findAll("tr"):
td1 = row.td.find(text=True, recursive=False)
print str(td1)
You'll get your output as:
text1
text2
This will work regardless of the position of the div
tag. See the example below.
>>> tr ="""
<table>
<tr><td>text1</td></tr>
<tr><td>text2<div>abc</div></td></tr>
<tr><td><div>abc</div>text3</td></tr>
</table>
"""
>>> table = BeautifulSoup(tr,"html.parser")
>>> for row in table.findAll("tr"):
td1 = row.td.find(text=True, recursive=False)
print str(td1)
text1
text2
text3
Upvotes: 3
Reputation: 4090
You could try this:
for row in table.findAll("tr"):
td = row.findAll("td")
t = td[0]
print t.contents[0]
But that will only work if you are always looking for the text before the div tag
Upvotes: 1