Wil Koetsier
Wil Koetsier

Reputation: 190

Does BeautifulSoup find_all() preserve tag order?

I wish to use BeautifulSoup to parse some HMTL. I have a table with several rows. I'm trying to find a row that meets certain conditions (certain attribute values) and use the index of that row later on in my code.

The question is: does find_all() preserve the order of my rows in the result set that it returns?

I didn't find this in the docs and Googling got me only to this answer:

'BeautifulSoup tags don't track their order in the page, no.'

but he does not say where he got that information from.

I'd be happy with an answer, but even more happy with a pointer to some documentation that explains this.

Edit: dstudeba pointed me in the direction of this 'workaround' using next_sibling.

from bs4 import BeautifulSoup
soup = BeautifulSoup(open('./mytable.html'), 'html.parser')
row = soup.find('tr', {'class':'something', 'someattr':'somevalue'})
myvalues = []
while True:
    cell = row.find('td', {'someattr':'cellspecificvalue'})
    myvalues.append(cell.get_text())
    row = row.find_next_sibling('tr', {'class':'something', 'someattr':'somevalue'})
    if not row:
        break

This gets me the cell contents I need in the order they appear in my html file.

However I'd still like to know where in the BeautifulSoup docs I could find whether find_all() preserves order or not. This is why I'm not accepting dstudeba's answer. (my upvote doesn't show, not enough rep yet :P)

Upvotes: 11

Views: 5461

Answers (1)

dstudeba
dstudeba

Reputation: 9048

It is my experience that find_all does preserve order. However to make sure you can use the find_all_next method which uses the find_next method which will preserve the order. Here is a link to the documentation.

Upvotes: 9

Related Questions