porteclefs
porteclefs

Reputation: 507

BeautifulSoup parent tag

I have some html that I want to extract text from. Here's an example of the html:

<p>TEXT I WANT <i> &#8211; </i></p>

Now, there are, obviously, lots of <p> tags in this document. So, find('p') is not a good way to get at the text I want to extract. However, that <i> tag is the only one in the document. So, I thought I could just find the <i> and then go to the parent.

I've tried:

up = soup.select('p i').parent

and

up = soup.select('i')
print(up.parent)

and I've tried it with .parents, I've tried find_all('i'), find('i')... But I always get:

'list' object has no attribute "parent"

What am I doing wrong?

Upvotes: 26

Views: 81960

Answers (5)

Martijn Pieters
Martijn Pieters

Reputation: 1121346

find_all() returns a list. find('i') returns the first matching element, or None.

The same applies to select() (returns a list) and select_one() (first match or None).

Thus, use:

try:
    up = soup.find('i').parent
except AttributeError:
    # no <i> element

Demo:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<p>TEXT I WANT <i> &#8211; </i></p>')
>>> soup.find('i').parent
<p>TEXT I WANT <i> – </i></p>
>>> soup.find('i').parent.text
u'TEXT I WANT  \u2013 '

Upvotes: 29

noobintech
noobintech

Reputation: 21

I think you are actually looking in a group of these kind of tags.The select function actually returns list of mentioned tags so if you are asking for the parent tag,it doesn't know which member of the list do you mean.Try

    up = soup.select('p i')[0].parent
    print(up)

this will tell that you are actually looking for the parentof first one in the list ('[0]').I don't know this will work just try it out.

Upvotes: 0

Chad Frederick
Chad Frederick

Reputation: 61

soup.select() returns a Python List. So you have 'unlist' the variable e.g.:

>>> [up] = soup.select('i')
>>> print(up.parent)

or

>>> up = soup.select('i')
>>> print(up[0].parent)

Upvotes: 6

amaslenn
amaslenn

Reputation: 805

Both select() and find_all() return you an array of elements. You should do like follow:

for el in soup.select('i'):
    print el.parent.text

Upvotes: 5

Totem
Totem

Reputation: 7349

This works:

i_tag = soup.find('i')
my_text = str(i_tag.previousSibling).strip()

output:

'TEXT I WANT'

As mentioned in other answers, find_all() returns a list, whereas find() returns the first match or None

If you are unsure about the presence of an i tag you could simply use a try/except block

Upvotes: 11

Related Questions