dobeerman
dobeerman

Reputation: 1424

BeautifulSoup. Wrong element index

I've been parsing an ol element of html and came across a problem with indexing of elements.

Let assume we have the following element:

html_document = """
<ol>
    <li>Test lists</li>
    <li>Second option</li>
    <li>Third option</li>
</ol>
"""

So, let's parse it:

soup = BeautifulSoup(html_document)
all_li = tuple(soup.find_all('li'))
result = [el.parent.index(el) for el in all_li]
print(result)  # [1, 3, 5]

Why 1,3,5? Or I've missed something?

Upvotes: 1

Views: 228

Answers (2)

xibalba1
xibalba1

Reputation: 544

In the definition of the index() method, we see the following code:

    def index(self, element):
        """
        Find the index of a child by identity, not value. Avoids issues with
        tag.contents.index(element) getting the index of equal elements.
        """
        for i, child in enumerate(self.contents):
            if child is element:
                return i
        raise ValueError("Tag.index: element not in tag")

So really you need to look at the .contents property, which shows the following members (the children of the <ol> tag):

0 <class 'bs4.element.NavigableString'> 
1 <class 'bs4.element.Tag'> <li>Test lists</li>
2 <class 'bs4.element.NavigableString'> 
3 <class 'bs4.element.Tag'> <li>Second option</li>
4 <class 'bs4.element.NavigableString'> 
5 <class 'bs4.element.Tag'> <li>Third option</li>
6 <class 'bs4.element.NavigableString'> 

In other words, the parent to your <li> tags, <ol>, has other children–the navigable strings, which you are not capturing directly because you only searched for the list items (soup.find_all('li')).

Upvotes: 2

KunduK
KunduK

Reputation: 33384

You are using the parent tag.Just use child tag.

html_document = """
<ol>
    <li>Test lists</li>
    <li>Second option</li>
    <li>Third option</li>
</ol>
"""

soup = BeautifulSoup(html_document,'lxml')
all_li = tuple(soup.find_all('li'))
result = [all_li.index(el) for el in all_li]
print(result)

output:

[0, 1, 2]

Upvotes: 1

Related Questions