user11960891
user11960891

Reputation:

Scraping Error: Index Error: List index out of range While(writing on csv) in python

How To Fix This Error:

Traceback (most recent call last):
  File "scrap.py", line 37, in <module>
    code()
  File "scrap.py", line 34, in code
    s.write(str(g_name[i].text) + ',' + str(phone[i].text) + ',' + str(website[i
].text) + ',' + str(reviews[i].text) + '\n')
IndexError: list index out of range

I'm Trying To Fix Again And Again But Every time i can't

What is the meaning of this error and why i'm getting this error?

Here is my code:

driver = webdriver.Chrome()
for url in urls:
    if str(url) == '0':
        driver.get('https://www.google.com/search?tbm=lcl&ei=kALeXauoIMWasAfc27TAAQ&q=software+house+in+johar+town+lahore&oq=software+house+in+johar+town+lahore&gs_l=psy-ab.3...0.0.0.96329.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.tvP3qqno_1Q')
    else:
        driver.get('https://www.google.com/search?tbm=lcl&sxsrf=ACYBGNTndl0R6IJRm1LcZ_bQJ14a-C3ocQ%3A1574830560313&ei=4AHeXc7kErH5sAfYr4PQCg&q=software+house+in+johar+town+lahore&oq=software+house+in+johar+town+lahore&gs_l=psy-ab.3...0.0.0.4519.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.S1G_WpFjvhI#rlfi=hd:;si:;mv:[[31.475505499999997,74.30897639999999],[31.4553548,74.2472458]];start:'+ str(url))
    if (driver.find_elements_by_css_selector('.dbg0pd div')):
        g_name = driver.find_elements_by_css_selector('.dbg0pd div')
    else:
        g_name = 'NONE'
    if (driver.find_elements_by_css_selector('.lqhpac div:nth-child(3) span')):
        phone = driver.find_elements_by_css_selector('.lqhpac div:nth-child(3) span')
    else:
        phone = 'NONE'
    if (driver.find_elements_by_css_selector('.L48Cpd .wLAgVc')):
        website = driver.find_elements_by_css_selector('.L48Cpd .wLAgVc')
    else:
        website = 'NONE'
    if (driver.find_elements_by_css_selector('.BTtC6e')):
        reviews = driver.find_elements_by_css_selector('.BTtC6e')
    else:
        reviews = 'NONE'

    items = len(g_name)

    with open('johartown.csv','a',encoding="utf-8") as s:
        for i in range(items):
            s.write(str(g_name[i].text) + ',' + str(phone[i].text) + ',' + str(website[i].get_attribute('href')) + ',' + str(reviews[i].text) + '\n')

Upvotes: 0

Views: 79

Answers (1)

Simas Joneliunas
Simas Joneliunas

Reputation: 3118

You define range in items = len(g_name) by the length of g_name. The length of g_name is greater than one or some of phone, website, or reviews thus you getting the error.

You must make sure that

  • the length of all of these objects is the same
  • add additional checks to only access the object if the required index is available
  • define items by the length of the shortest of your data objects.

On the other hand, the actual problem you are facing here is that the selectors you are using are unable to deal with missing elements on the website.

I would suggest you rewrite your logic so that you would be parsing content holders (elements that contain all of your necessary fields) rather than the fields themselves and then define additional rules within that logic to handle the missing CSS selectors.

In layman terms, do not look for names, phones, websites, and reviews but instead look for "users" and then define a parser that would go through all of the "users" and extract the data that you need.

Upvotes: 2

Related Questions