Alex
Alex

Reputation: 44405

Using beautifulsoup to parse tag with some text

Some html code contains some dt tags like follows:

<dt>PLZ:</dt>
<dd>
8047
</dd>

I want to find the text in the dd tag following the dt tag with the text PLZ:. According to documentation I am trying the following:

number = BeautifulSoup(text).find("dt",text="PLZ:").findNextSiblings("dd")

with text the above string, but all I get is an empty list instead the number I am looking for (as string of course). Maybe I misunderstand the documentation?

Upvotes: 2

Views: 2004

Answers (2)

Vahid Chakoshy
Vahid Chakoshy

Reputation: 1527

so just try:

from BeautifulSoup import BeautifulSoup

text = """
<dt>PLZ:</dt>
<dd>
8047
</dd>"""

number = BeautifulSoup(text).find("dt",text="PLZ:").parent.findNextSiblings("dd")
print BeautifulSoup(''.join(number[0]))

or if you find with findNext try:

number = BeautifulSoup(text).find("dt",text="PLZ:").parent.findNext("dd").contents[0]

Upvotes: 2

Brian Cain
Brian Cain

Reputation: 14619

This worked for me:

from BeautifulSoup import BeautifulSoup

text = '''<dt>PLZ:</dt>
<dd>
8047
</dd>'''


BeautifulSoup(text).find("dt",text="PLZ:").parent.findNextSiblings('dd')

Upvotes: 0

Related Questions