Reputation:
I wrote this piece of code to retrieve text from a table from this page. When I used it for the first column it works fine:
from bs4 import BeautifulSoup
import urllib2 #xbmc, xbmcgui, xbmcaddon
url = 'http://racing4everyone.eu/formula-e-201516/'
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read(), 'html.parser')
for row in soup.findAll('table')[0].tbody.findAll('tr'):
first_column = row.findAll('th')[0].text
print first_column
However, when I try to extract the same data from the second column:
for row in soup.findAll('table')[0].tbody.findAll('tr'):
second_column = row.findAll('th')[1].text
print second_column
I get an error:
ePrix
Traceback (most recent call last):
File "addon.py", line 9, in <module>
second_column = row.findAll('th')[1].text
IndexError: list index out of range
What am I doing wrong?
Upvotes: 5
Views: 3566
Reputation: 473873
This is because all rows except the first one contain a single th
element:
<tr>
<th>1</th>
<td>...</td>
...
<td>24 October 2015</td>
</tr>
You would need to find all the td
or th
elements from each row and get the first one:
for row in soup.find_all('table')[0].tbody.find_all('tr')[1:]:
print(row.find_all('td')[0].text)
[1:]
here is to skip the first header row.
Prints:
Beijing ePrix
Putrajaya ePrix
Punta del Este ePrix
Buenos Aires ePrix
Mexico
Long Beach ePrix
Paris ePrix
Berlin ePrix
Moscow ePrix
London ePrix Race 1
London ePrix Race 2
Upvotes: 4