mutantChickenHer0
mutantChickenHer0

Reputation: 223

Issues with appending to list - IndexError: list index out of range

I'm building a simple web scraping script to get my feet wet with python. I'm hitting a little but of an issue with the following

#Create 3 different lists to populate.
mails = []
phones = []
webs = []

def go_get_info(info):
    for item in info:
        #email = (item.contents[3].find_all("span", {"class": "text"})[0].text).strip()
        #phone = (item.contents[3].find_all("span", {"class": "text"})[1].text).strip()                                                                                                          
        www = (item.contents[3].find_all("span", {"class": "text"})[2].text).strip()
        if not www:
                webs.append("empty")
        else:
                webs.append(www)

The idea is that I would get email, phone, and web address into each of the three lists, zip them togther and then iterate through and write to CSV.

The only value here that I seem to have an issue with is www **(and so as you can see I've left it uncommented). **I've also tried to mitigate the issue by adding an empty condition.****

When I run the script that calls this function, I am returned the following

± |add-csv-support U:1 ?:1 ✗| → python scrape.py 
Traceback (most recent call last):
  File "scrape.py", line 55, in <module>
    go_retrieve_contact(get_venue_link_list(links))
  File "scrape.py", line 30, in go_retrieve_contact
    go_get_info(info)
  File "scrape.py", line 43, in go_get_info
    www = (item.contents[3].find_all("span", {"class": "text"})[2].text).strip()
IndexError: list index out of range

It makes sense to me that the issue is either with the value being returned or the lack of a value. I've googled but couldn't find a complete solution.

What could I do in this case to

A) Better understand whats happening and debug better.

B) Solve the problem.

Thanks,

Upvotes: 0

Views: 2393

Answers (1)

Greg Friedman
Greg Friedman

Reputation: 341

The problem is that you are referring to the fourth element (item.contents[3]) or the 3rd element(find_all(...)[2]) and one of those 2 arrays does not have that many elements, which is what list index out of range means.

www = (item.contents[3].find_all("span", {"class": "text"})[2].text).strip()

Since this is part of a scraping tool, you might want to write a line that checks how many elements you are getting in your find_all by nesting it in a if len((...).find_all(...)) >= 3 statement or use try except

Upvotes: 1

Related Questions