Reputation: 190
I wish to use BeautifulSoup to parse some HMTL. I have a table with several rows. I'm trying to find a row that meets certain conditions (certain attribute values) and use the index of that row later on in my code.
The question is: does find_all()
preserve the order of my rows in the result set that it returns?
I didn't find this in the docs and Googling got me only to this answer:
'BeautifulSoup tags don't track their order in the page, no.'
but he does not say where he got that information from.
I'd be happy with an answer, but even more happy with a pointer to some documentation that explains this.
Edit: dstudeba pointed me in the direction of this 'workaround' using next_sibling
.
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('./mytable.html'), 'html.parser')
row = soup.find('tr', {'class':'something', 'someattr':'somevalue'})
myvalues = []
while True:
cell = row.find('td', {'someattr':'cellspecificvalue'})
myvalues.append(cell.get_text())
row = row.find_next_sibling('tr', {'class':'something', 'someattr':'somevalue'})
if not row:
break
This gets me the cell contents I need in the order they appear in my html file.
However I'd still like to know where in the BeautifulSoup docs I could find whether find_all()
preserves order or not. This is why I'm not accepting dstudeba's answer. (my upvote doesn't show, not enough rep yet :P)
Upvotes: 11
Views: 5461
Reputation: 9048
It is my experience that find_all
does preserve order. However to make sure you can use the find_all_next
method which uses the find_next
method which will preserve the order. Here is a link to the documentation.
Upvotes: 9