Reputation:
How To Fix This Error:
Traceback (most recent call last):
File "scrap.py", line 37, in <module>
code()
File "scrap.py", line 34, in code
s.write(str(g_name[i].text) + ',' + str(phone[i].text) + ',' + str(website[i
].text) + ',' + str(reviews[i].text) + '\n')
IndexError: list index out of range
I'm Trying To Fix Again And Again But Every time i can't
What is the meaning of this error and why i'm getting this error?
Here is my code:
driver = webdriver.Chrome()
for url in urls:
if str(url) == '0':
driver.get('https://www.google.com/search?tbm=lcl&ei=kALeXauoIMWasAfc27TAAQ&q=software+house+in+johar+town+lahore&oq=software+house+in+johar+town+lahore&gs_l=psy-ab.3...0.0.0.96329.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.tvP3qqno_1Q')
else:
driver.get('https://www.google.com/search?tbm=lcl&sxsrf=ACYBGNTndl0R6IJRm1LcZ_bQJ14a-C3ocQ%3A1574830560313&ei=4AHeXc7kErH5sAfYr4PQCg&q=software+house+in+johar+town+lahore&oq=software+house+in+johar+town+lahore&gs_l=psy-ab.3...0.0.0.4519.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.S1G_WpFjvhI#rlfi=hd:;si:;mv:[[31.475505499999997,74.30897639999999],[31.4553548,74.2472458]];start:'+ str(url))
if (driver.find_elements_by_css_selector('.dbg0pd div')):
g_name = driver.find_elements_by_css_selector('.dbg0pd div')
else:
g_name = 'NONE'
if (driver.find_elements_by_css_selector('.lqhpac div:nth-child(3) span')):
phone = driver.find_elements_by_css_selector('.lqhpac div:nth-child(3) span')
else:
phone = 'NONE'
if (driver.find_elements_by_css_selector('.L48Cpd .wLAgVc')):
website = driver.find_elements_by_css_selector('.L48Cpd .wLAgVc')
else:
website = 'NONE'
if (driver.find_elements_by_css_selector('.BTtC6e')):
reviews = driver.find_elements_by_css_selector('.BTtC6e')
else:
reviews = 'NONE'
items = len(g_name)
with open('johartown.csv','a',encoding="utf-8") as s:
for i in range(items):
s.write(str(g_name[i].text) + ',' + str(phone[i].text) + ',' + str(website[i].get_attribute('href')) + ',' + str(reviews[i].text) + '\n')
Upvotes: 0
Views: 79
Reputation: 3118
You define range in items = len(g_name)
by the length of g_name
. The length of g_name
is greater than one or some of phone
, website
, or reviews
thus you getting the error.
You must make sure that
items
by the length of the shortest of your data objects.On the other hand, the actual problem you are facing here is that the selectors you are using are unable to deal with missing elements on the website.
I would suggest you rewrite your logic so that you would be parsing content holders (elements that contain all of your necessary fields) rather than the fields themselves and then define additional rules within that logic to handle the missing CSS selectors.
In layman terms, do not look for names, phones, websites, and reviews but instead look for "users" and then define a parser that would go through all of the "users" and extract the data that you need.
Upvotes: 2