Reputation: 223
I'm building a simple web scraping script to get my feet wet with python. I'm hitting a little but of an issue with the following
#Create 3 different lists to populate.
mails = []
phones = []
webs = []
def go_get_info(info):
for item in info:
#email = (item.contents[3].find_all("span", {"class": "text"})[0].text).strip()
#phone = (item.contents[3].find_all("span", {"class": "text"})[1].text).strip()
www = (item.contents[3].find_all("span", {"class": "text"})[2].text).strip()
if not www:
webs.append("empty")
else:
webs.append(www)
The idea is that I would get email, phone, and web address into each of the three lists, zip them togther and then iterate through and write to CSV.
The only value here that I seem to have an issue with is www **(and so as you can see I've left it uncommented). **I've also tried to mitigate the issue by adding an empty condition.****
When I run the script that calls this function, I am returned the following
± |add-csv-support U:1 ?:1 ✗| → python scrape.py
Traceback (most recent call last):
File "scrape.py", line 55, in <module>
go_retrieve_contact(get_venue_link_list(links))
File "scrape.py", line 30, in go_retrieve_contact
go_get_info(info)
File "scrape.py", line 43, in go_get_info
www = (item.contents[3].find_all("span", {"class": "text"})[2].text).strip()
IndexError: list index out of range
It makes sense to me that the issue is either with the value being returned or the lack of a value. I've googled but couldn't find a complete solution.
What could I do in this case to
A) Better understand whats happening and debug better.
B) Solve the problem.
Thanks,
Upvotes: 0
Views: 2393
Reputation: 341
The problem is that you are referring to the fourth element (item.contents[3]
) or the 3rd element(find_all(...)[2]
) and one of those 2 arrays does not have that many elements, which is what list index out of range
means.
www = (item.contents[3].find_all("span", {"class": "text"})[2].text).strip()
Since this is part of a scraping tool, you might want to write a line that checks how many elements you are getting in your find_all by nesting it in a if len((...).find_all(...)) >= 3
statement or use try except
Upvotes: 1