Reputation: 16781
I really can't manage to figure this out. I parsed the following link with BeautifulSoup and I did this:
soup.find(text='Title').find_parent('h3')
And it does not find anything. If you take a look on the code of the linked page, you'll see a h3
tag which contains the word Titles
.
The exact point is:
<h3 class="findSectionHeader"><a name="tt"></a>Titles</h3>
If I make BS parse the line above only, it works perfectly. I tried also with:
soup.find(text='Title').find_parents('h3')
soup.find(text='Title').find_parent(class_='findSectionHeader')
which both work on the line only, but don't work on the entire html.
If I do a soup.find(text='Titles').find_parents('div')
it works with the entire html.
Upvotes: 3
Views: 2302
Reputation: 1122342
Before the findSectionHeader
H3 tag, there is another tag with Title
in the text:
>>> soup.find(text='Title').parent
<a href="/find?q=batman&s=tt&ref_=fn_tt">Title</a>
You need to be more specific in your search, search for Titles
instead, and loop to find the correct one:
>>> soup.find(text='Titles').parent
<option value="tt">Titles</option>
>>> for elem in soup.find_all(text='Titles'):
... parent_h3 = elem.find_parent('h3')
... if parent_h3 is None:
... continue
... print parent_h3
...
<h3 class="findSectionHeader"><a name="tt"></a>Titles</h3>
find(text='...')
only matches the full text, not a partial match. Use a regular expression if you need partial matches instead:
>>> import re
>>> soup.find_all(text='Title')
[u'Title']
>>> soup.find_all(text=re.compile('Title'))
[u'Titles', u'Titles', u'Titles', u'Title', u'Advanced Title Search']
Upvotes: 1