Reputation: 5779
I'm using beautifulsoup to do the following:
section = soup.findAll('tbody')[0]
How can set variable like that using the first list item... without it throwing an exception to: IndexError: list index out of range
if BS4 can't find tbody?
Any ideas?
Upvotes: 2
Views: 9590
Reputation: 5452
The doc says
Because find_all() is the most popular method in the Beautiful Soup search API, you can use a shortcut for it. If you treat the BeautifulSoup object or a Tag object as though it were a function, then it’s the same as calling find_all() on that object.
so in your case I think you can just do:
if soup("tbody"):
section = soup("tbody")[0]
Note that,in your code, when the error occurs section
is an empty list, but you're attempting to get the element [0], which doesn't exist yet. In the above code you first check that the list exists and is not empty. If the check is passed then you can access the first element of the list.
Upvotes: 3
Reputation: 19648
Everyone who parses HTML will run into this type a question. The element you are looking for is located in a nested structure... table -> tbody -> tr -> td ... etc...
However, you need to keep a few things in mind:
(1) The more detail you specify the path to find your element. The easier your code will break if you don't handle the exceptions correctly and actually, the logic you find the path might not be general at all..
(2) Try to locate elements by unique id or classes instead of counting on the order of some general tags..
(3) If the text you are trying to collect follow a pattern. you can find it easily using text itself , which is more straightforward for programmer... texts are what people see actually.
import re
...
print soup.find_all(text=re.compile("pattern"))
# then you can find the element by calling parent of the found texts.
In a short way, one should never search for a "tbody" tag in my point of view... because the code is alway like:
<table..>
<tbody>
<tr>
...
</tbody>
<table>
If you have found the table already, you can just do
table = soup.find('table'...)
# unless you are trying to not recursively find tr, then you have to find tobody first and find_all(recursive=FALSE)
table.find_all('tr')
Upvotes: 5
Reputation: 2111
You can return the answer from findAll
and chek it's length first:
x = soup.findAll("tbody")
if x is not None and len(x) > 0:
section = x[0]
Upvotes: 4