whatyouhide
whatyouhide

Reputation: 16781

BeautifulSoup not finding parents

I really can't manage to figure this out. I parsed the following link with BeautifulSoup and I did this:

soup.find(text='Title').find_parent('h3')

And it does not find anything. If you take a look on the code of the linked page, you'll see a h3 tag which contains the word Titles. The exact point is:

<h3 class="findSectionHeader"><a name="tt"></a>Titles</h3>

If I make BS parse the line above only, it works perfectly. I tried also with:

soup.find(text='Title').find_parents('h3')
soup.find(text='Title').find_parent(class_='findSectionHeader')

which both work on the line only, but don't work on the entire html.

If I do a soup.find(text='Titles').find_parents('div') it works with the entire html.

Upvotes: 3

Views: 2302

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1122342

Before the findSectionHeader H3 tag, there is another tag with Title in the text:

>>> soup.find(text='Title').parent
<a href="/find?q=batman&amp;s=tt&amp;ref_=fn_tt">Title</a>

You need to be more specific in your search, search for Titles instead, and loop to find the correct one:

>>> soup.find(text='Titles').parent
<option value="tt">Titles</option>
>>> for elem in soup.find_all(text='Titles'):
...     parent_h3 = elem.find_parent('h3')
...     if parent_h3 is None:
...         continue
...     print parent_h3
... 
<h3 class="findSectionHeader"><a name="tt"></a>Titles</h3>

find(text='...') only matches the full text, not a partial match. Use a regular expression if you need partial matches instead:

>>> import re
>>> soup.find_all(text='Title')
[u'Title']
>>> soup.find_all(text=re.compile('Title'))
[u'Titles', u'Titles', u'Titles', u'Title', u'Advanced Title Search']

Upvotes: 1

Related Questions