Reputation: 159
I'm scraping a webpage, and writing the output to a .csv. I'm getting a "list index out of range" error. I think I understand what the error means, but I'm uncertain how to fix it.
The HTML code that houses the containers over which I want to iterate looks like this:
<tr class="featured even" role="row"><td class="sorting_1 dcLogo">
<a href="company/company">
<img src="URL" alt="Company Name" width="50">
</a>
</td><td class="dcCompanyName"><a href="URL">Company Name</a></td><td class="dcBoothLabel">9999</td><td class="dcCategories">Widget 1, Widget 2, Widget 3</td><td class="dcCityState">CITY, STATE<br/></td><td class="dcCountry">US</td><td style="visibility:hidden;display:none;">4</td></tr>
My code looks like this:
page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll('tr')
del containers[8]
company_names = []
booth_numbers = []
categories = []
countries = []
print("generating csv")
with open('CompanyList.csv','w') as f:
csv_out = csv.writer(f)
csv_out.writerow(["company_name", "booth_number", "category", "country"])
for container in containers:
cols = container.findAll("td")
company_name = cols[1].find("a").text
booth_number = cols[2].text
category = cols[3].text.strip()
country = cols[5].text
company_names.append(company_name)
booth_numbers.append(booth_number)
categories.append(category)
countries.append(country)
csv_out.writerow([company_name, booth_number, category, country])
f.close
print('Done Writing to File')
When I run this, I get an "IndexError: list index out of range" error pointing at the:
booth_number = cols[3].text
Any help would be greatly appreciated.
Upvotes: 1
Views: 72
Reputation: 2674
Some of the lines in the .csv file don't have as many columns as you're expecting. It looks like you think it should be a consistent amount of columns so you just need to check before you actually start index the row like this:
for container in containers:
if len(cols) == 7:
cols = container.findAll("td")
company_name = cols[1].find("a").text
booth_number = cols[2].text
category = cols[3].text.strip()
country = cols[5].text
company_names.append(company_name)
booth_numbers.append(booth_number)
categories.append(category)
countries.append(country)
csv_out.writerow([company_name, booth_number, category, country])
I'm assuming that there will be 7 columns as that's what you calculated the first row to be but you can change that to whatever it should be.
Upvotes: 0
Reputation: 364
The problem is that the cols array has a length of less than the element you are trying to access. In the example
booth_number = cols[3].text
the cols array has a length of 3 or less because the array indexing is zero-based (element 1 has index of 0). When you try to access the fourth element with an index of 3, you are accessing an element outside of the range.
You can remedy this with a check for the length before accessing the element.
if len(cols) > 3:
booth_number = cols[3].text
that way, if the booth number is not in the cols, your program does not fail and stop.
Upvotes: 1
Reputation: 49803
There are not as many columns as you are assuming.
You can see how many columns there are with len(cols)
, and based on that, decide what to do when this expected column isn't there.
Note that you'll have a similar issue with the line after that.
Upvotes: 0