Larry Cai
Larry Cai

Reputation: 59933

BeautifulSoup to get first value using string/text

Beautifulsoup is handy for html parsing in python, but I meet problem to have clean code to get the value directly using string or text

from bs4 import BeautifulSoup
tr ="""    
<table>
    <tr><td>text1</td></tr>
    <tr><td>text2<div>abc</div></td></tr>
</table>
"""
table = BeautifulSoup(tr,"html.parser")
for row in table.findAll("tr"):
    td = row.findAll("td")
    print td[0].text
    print td[0].string

result:

text1
text1
text2abc
None

How can I get the result for

text1
text2

I want to skip the extra inner tag

beautifulsoup4-4.5.0 is used with python 2.7

Upvotes: 1

Views: 1270

Answers (2)

JRodDynamite
JRodDynamite

Reputation: 12613

You could simply use the .find() function by setting the text and recursive argument.

for row in table.findAll("tr"):
    td1 = row.td.find(text=True, recursive=False)
    print str(td1)

You'll get your output as:

text1
text2

This will work regardless of the position of the div tag. See the example below.

>>> tr ="""    
<table>
    <tr><td>text1</td></tr>
    <tr><td>text2<div>abc</div></td></tr>
    <tr><td><div>abc</div>text3</td></tr>
</table>
"""
>>> table = BeautifulSoup(tr,"html.parser")
>>> for row in table.findAll("tr"):
        td1 = row.td.find(text=True, recursive=False)
        print str(td1)


text1
text2
text3

Upvotes: 3

Sam
Sam

Reputation: 4090

You could try this:

for row in table.findAll("tr"):
    td = row.findAll("td")
    t = td[0]
    print t.contents[0]

But that will only work if you are always looking for the text before the div tag

Upvotes: 1

Related Questions