Reputation: 507
I have some html that I want to extract text from. Here's an example of the html:
<p>TEXT I WANT <i> – </i></p>
Now, there are, obviously, lots of <p>
tags in this document. So, find('p')
is not a good way to get at the text I want to extract. However, that <i>
tag is the only one in the document. So, I thought I could just find the <i>
and then go to the parent.
I've tried:
up = soup.select('p i').parent
and
up = soup.select('i')
print(up.parent)
and I've tried it with .parents
, I've tried find_all('i')
, find('i')
... But I always get:
'list' object has no attribute "parent"
What am I doing wrong?
Upvotes: 26
Views: 81960
Reputation: 1121346
find_all()
returns a list. find('i')
returns the first matching element, or None
.
The same applies to select()
(returns a list) and select_one()
(first match or None
).
Thus, use:
try:
up = soup.find('i').parent
except AttributeError:
# no <i> element
Demo:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<p>TEXT I WANT <i> – </i></p>')
>>> soup.find('i').parent
<p>TEXT I WANT <i> – </i></p>
>>> soup.find('i').parent.text
u'TEXT I WANT \u2013 '
Upvotes: 29
Reputation: 21
I think you are actually looking in a group of these kind of tags.The select function actually returns list of mentioned tags so if you are asking for the parent tag,it doesn't know which member of the list do you mean.Try
up = soup.select('p i')[0].parent
print(up)
this will tell that you are actually looking for the parentof first one in the list ('[0]').I don't know this will work just try it out.
Upvotes: 0
Reputation: 61
soup.select()
returns a Python List. So you have 'unlist' the variable
e.g.:
>>> [up] = soup.select('i')
>>> print(up.parent)
or
>>> up = soup.select('i')
>>> print(up[0].parent)
Upvotes: 6
Reputation: 805
Both select()
and find_all()
return you an array of elements. You should do like follow:
for el in soup.select('i'):
print el.parent.text
Upvotes: 5
Reputation: 7349
This works:
i_tag = soup.find('i')
my_text = str(i_tag.previousSibling).strip()
output:
'TEXT I WANT'
As mentioned in other answers, find_all()
returns a list, whereas find()
returns the first match or None
If you are unsure about the presence of an i tag you could simply use a try/except
block
Upvotes: 11