Scraping Error: Index Error: List index out of range While(writing on csv) in python

Question

How To Fix This Error:

Traceback (most recent call last):
  File "scrap.py", line 37, in 
    code()
  File "scrap.py", line 34, in code
    s.write(str(g_name[i].text) + ',' + str(phone[i].text) + ',' + str(website[i
].text) + ',' + str(reviews[i].text) + '
')
IndexError: list index out of range

I'm Trying To Fix Again And Again But Every time i can't

What is the meaning of this error and why i'm getting this error?

Here is my code:

driver = webdriver.Chrome()
for url in urls:
    if str(url) == '0':
        driver.get('https://www.google.com/search?tbm=lcl&ei=kALeXauoIMWasAfc27TAAQ&q=software+house+in+johar+town+lahore&oq=software+house+in+johar+town+lahore&gs_l=psy-ab.3...0.0.0.96329.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.tvP3qqno_1Q')
    else:
        driver.get('https://www.google.com/search?tbm=lcl&sxsrf=ACYBGNTndl0R6IJRm1LcZ_bQJ14a-C3ocQ%3A1574830560313&ei=4AHeXc7kErH5sAfYr4PQCg&q=software+house+in+johar+town+lahore&oq=software+house+in+johar+town+lahore&gs_l=psy-ab.3...0.0.0.4519.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.S1G_WpFjvhI#rlfi=hd:;si:;mv:[[31.475505499999997,74.30897639999999],[31.4553548,74.2472458]];start:'+ str(url))
    if (driver.find_elements_by_css_selector('.dbg0pd div')):
        g_name = driver.find_elements_by_css_selector('.dbg0pd div')
    else:
        g_name = 'NONE'
    if (driver.find_elements_by_css_selector('.lqhpac div:nth-child(3) span')):
        phone = driver.find_elements_by_css_selector('.lqhpac div:nth-child(3) span')
    else:
        phone = 'NONE'
    if (driver.find_elements_by_css_selector('.L48Cpd .wLAgVc')):
        website = driver.find_elements_by_css_selector('.L48Cpd .wLAgVc')
    else:
        website = 'NONE'
    if (driver.find_elements_by_css_selector('.BTtC6e')):
        reviews = driver.find_elements_by_css_selector('.BTtC6e')
    else:
        reviews = 'NONE'

    items = len(g_name)

    with open('johartown.csv','a',encoding="utf-8") as s:
        for i in range(items):
            s.write(str(g_name[i].text) + ',' + str(phone[i].text) + ',' + str(website[i].get_attribute('href')) + ',' + str(reviews[i].text) + '
')

Simas Joneliunas · Accepted Answer

You define range in items = len(g_name) by the length of g_name. The length of g_name is greater than one or some of phone, website, or reviews thus you getting the error.

You must make sure that

the length of all of these objects is the same
add additional checks to only access the object if the required index is available
define items by the length of the shortest of your data objects.

On the other hand, the actual problem you are facing here is that the selectors you are using are unable to deal with missing elements on the website.

I would suggest you rewrite your logic so that you would be parsing content holders (elements that contain all of your necessary fields) rather than the fields themselves and then define additional rules within that logic to handle the missing CSS selectors.

In layman terms, do not look for names, phones, websites, and reviews but instead look for "users" and then define a parser that would go through all of the "users" and extract the data that you need.

Scraping Error: Index Error: List index out of range While(writing on csv) in python

Answers (1)

Related Questions