Reputation: 105
I m working with BeautifulSoup in Python for scraping a webpage. The html under issue looks like below:
<td><a href="blah.html>blahblah</a></td>
<td>line2</td>
<td></td>
i wish to take the contents of the td tag. So for the first td, i need the "blahblah" text and for the next td, i want to write "line2" and for the last td, "blank" because there is no content.
my code snippet looks like this -
row = []
for each_td in td:
link = each_td.find_all('a')
if link:
row.append(link[0].contents[0])
row.append(link[0]['href'])
elif each_td.contents[0] is None:
row.append('blank')
else:
row.append(each_td.contents[0])
print row
However on running, i get the error -
elif each_td.contents[0] is None:
IndexError: list index out of range
Note- i am working with beautifulsoup.
How do I test for the "no-content-td" and weite appropriately? Why is the "... is None" not working?
Upvotes: 3
Views: 8885
Reputation: 69
You can handle the exception . Below is the code
try:
row.append(each_td.contents[0])
except IndexError:
//do what is required if it is empty ...
Upvotes: 1
Reputation:
Who said that 'contents' has always at least one element? Obviously you encounter the situation that 'contents' has no elements and therefore you will this error.
A more appropriate check would be:
if each_td.contents:
or
if len(each_td.contents) > 0:
But your preassumption is just wrong.
Upvotes: 11
Reputation: 13356
You can use .text
to get the text.
row = []
for each_td in td:
row.append(each_td.text)
print row
Upvotes: 4