PythonFisher
PythonFisher

Reputation: 159

How to fix "List Index Our of Range" error

I'm scraping a webpage, and writing the output to a .csv. I'm getting a "list index out of range" error. I think I understand what the error means, but I'm uncertain how to fix it.

The HTML code that houses the containers over which I want to iterate looks like this:

<tr class="featured even" role="row"><td class="sorting_1 dcLogo">
    <a href="company/company">
    <img src="URL" alt="Company Name" width="50">
    </a>
    </td><td class="dcCompanyName"><a href="URL">Company Name</a></td><td class="dcBoothLabel">9999</td><td class="dcCategories">Widget 1, Widget 2, Widget 3</td><td class="dcCityState">CITY, STATE<br/></td><td class="dcCountry">US</td><td style="visibility:hidden;display:none;">4</td></tr>

My code looks like this:


page_soup = soup(page_html, "html.parser")

containers = page_soup.findAll('tr')
del containers[8]

company_names = []
booth_numbers = []
categories = []
countries = []

print("generating csv")
with open('CompanyList.csv','w') as f:
    csv_out = csv.writer(f)
    csv_out.writerow(["company_name", "booth_number", "category", "country"])
    for container in containers:
            cols = container.findAll("td")
            company_name = cols[1].find("a").text
            booth_number = cols[2].text
            category = cols[3].text.strip()
            country = cols[5].text

            company_names.append(company_name)
            booth_numbers.append(booth_number)
            categories.append(category)
            countries.append(country)

            csv_out.writerow([company_name, booth_number, category, country])

f.close
print('Done Writing to File')

When I run this, I get an "IndexError: list index out of range" error pointing at the:

booth_number = cols[3].text

Any help would be greatly appreciated.

Upvotes: 1

Views: 72

Answers (3)

Tom Dee
Tom Dee

Reputation: 2674

Some of the lines in the .csv file don't have as many columns as you're expecting. It looks like you think it should be a consistent amount of columns so you just need to check before you actually start index the row like this:

for container in containers:
    if len(cols) == 7:
        cols = container.findAll("td")
        company_name = cols[1].find("a").text
        booth_number = cols[2].text
        category = cols[3].text.strip()
        country = cols[5].text

        company_names.append(company_name)
        booth_numbers.append(booth_number)
        categories.append(category)
        countries.append(country)

        csv_out.writerow([company_name, booth_number, category, country])

I'm assuming that there will be 7 columns as that's what you calculated the first row to be but you can change that to whatever it should be.

Upvotes: 0

drxl
drxl

Reputation: 364

The problem is that the cols array has a length of less than the element you are trying to access. In the example

booth_number = cols[3].text

the cols array has a length of 3 or less because the array indexing is zero-based (element 1 has index of 0). When you try to access the fourth element with an index of 3, you are accessing an element outside of the range.

You can remedy this with a check for the length before accessing the element.

if len(cols) > 3:
     booth_number = cols[3].text

that way, if the booth number is not in the cols, your program does not fail and stop.

Upvotes: 1

Scott Hunter
Scott Hunter

Reputation: 49803

There are not as many columns as you are assuming.

You can see how many columns there are with len(cols), and based on that, decide what to do when this expected column isn't there.

Note that you'll have a similar issue with the line after that.

Upvotes: 0

Related Questions