Ryflex
Ryflex

Reputation: 5779

Beautifulsoup Exception list out of range

I'm using beautifulsoup to do the following:
section = soup.findAll('tbody')[0]

How can set variable like that using the first list item... without it throwing an exception to: IndexError: list index out of range if BS4 can't find tbody?

Any ideas?

Upvotes: 2

Views: 9590

Answers (3)

Vicent
Vicent

Reputation: 5452

The doc says

Because find_all() is the most popular method in the Beautiful Soup search API, you can use a shortcut for it. If you treat the BeautifulSoup object or a Tag object as though it were a function, then it’s the same as calling find_all() on that object.

so in your case I think you can just do:

if soup("tbody"):
    section = soup("tbody")[0]

Note that,in your code, when the error occurs section is an empty list, but you're attempting to get the element [0], which doesn't exist yet. In the above code you first check that the list exists and is not empty. If the check is passed then you can access the first element of the list.

Upvotes: 3

B.Mr.W.
B.Mr.W.

Reputation: 19648

Everyone who parses HTML will run into this type a question. The element you are looking for is located in a nested structure... table -> tbody -> tr -> td ... etc...

However, you need to keep a few things in mind:

(1) The more detail you specify the path to find your element. The easier your code will break if you don't handle the exceptions correctly and actually, the logic you find the path might not be general at all..

(2) Try to locate elements by unique id or classes instead of counting on the order of some general tags..

(3) If the text you are trying to collect follow a pattern. you can find it easily using text itself , which is more straightforward for programmer... texts are what people see actually.

import re
...
print soup.find_all(text=re.compile("pattern"))
# then you can find the element by calling parent of the found texts.

In a short way, one should never search for a "tbody" tag in my point of view... because the code is alway like:

<table..>
    <tbody>
        <tr>
        ...
    </tbody>
<table>

If you have found the table already, you can just do

table = soup.find('table'...)
# unless you are trying to not recursively find tr, then you have to find tobody first and find_all(recursive=FALSE)
table.find_all('tr')

Upvotes: 5

Andy Rimmer
Andy Rimmer

Reputation: 2111

You can return the answer from findAll and chek it's length first:

x = soup.findAll("tbody")

if x is not None and len(x) > 0:
    section = x[0]

Upvotes: 4

Related Questions