Max Frai
Max Frai

Reputation: 64266

BeautifulSoup question

<parent1>
    <span>Text1</span>
</parnet1>
<parent2>
    <span>Text2</span>
</parnet2>
<parent3>
    <span>Text3</span>
</parnet3>

I'm parsing this with Python & BeautifulSoup. I have a variable soupData which stores pointer for need object. How can I get pointer for the parent2, for example, if I have the text Text2. So the problem is to filter span-tags by content. How can I do this?

Upvotes: 0

Views: 226

Answers (3)

Blair
Blair

Reputation: 175

Using python 2.7.6 and BeautifulSoup 4.3.2 I found Marcelo's answer to give an empty list. This worked for me, however:

[x.parent for x in bSoup.findAll('span') if x.text == 'Text2'][0]

Alternatively, for a ridiculously overengineered solution (to this particular problem at least, but maybe it would be useful if you'll be doing filtering on criteria too long to put in a reasonably easily understandable list expression) you could do:

def hasText(text):
    def hasTextFunc(x):
        return x.text == text
    return hasTextFunc

to create a function factory, then

hasTextText2 = hasText('Text2')

filter(hasTextText2,bSoup.findAll('span'))[0].parent

to get the reference to the parent tag that you were looking for

Upvotes: 0

Marcelo Cantos
Marcelo Cantos

Reputation: 185852

After correcting the spelling on the end-tags:

[e for e in soup(recursive=False, text=False) if e.span.string == 'Text2']

Upvotes: 1

Thomas K
Thomas K

Reputation: 40340

I don't think there's a way to do it in a single step. So:

for parenttag in soupData:
    if parenttag.span.string == "Text2":
        do_stuff(parenttag)
        break

It's possible to use a generator expression, but not much shorter.

Upvotes: 1

Related Questions