kikee1222
kikee1222

Reputation: 2026

List index is out of range - but I'm checking the length before processing

Wonder if you can advise - I get the below error when processing a list of items. I should note, that this script works for 99% of items - as I've expanded the list now to 84M rows, I am now getting this issue.

I do this for each line

elif len(str(x)) > 3 and str(x[len(x)-2]).rstrip() in cdns:

So, I don't see how the index can be out of range, if I'm actively checking if it's over a certain length before processing?

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-2-a28be4b396bd> in <module>()
     21     elif len(str(x)) > 4 and str(x[len(x)-2]).rstrip() in cdns:
     22       cleandomain.append(str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+str(x[len(x)-1]))
---> 23     elif len(str(x)) > 5 and str(x[len(x)-3]).rstrip() in cdns:
     24       cleandomain.append(str(x[len(x)-4])+'.'+str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+ str(x[len(x)-1]))
     25     #if its in the TLD list, do this

IndexError: list index out of range

The full loop is below, so I'd expect that if the index list item was out of range, that it'd just carry out the other command & print the list value?

  for x in index:
    #if it ends with a number, it's an IP
    if str(x)[-1].isnumeric():
      cleandomain.append(str(x[0])+'.'+str(x[1])+'.*.*')
    #if its in the CDN list, take a subdomain as well
    elif len(str(x)) > 3 and str(x[len(x)-2]).rstrip() in cdns:
      cleandomain.append(str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+str(x[len(x)-1]))
    elif len(str(x)) > 4 and str(x[len(x)-3]).rstrip() in cdns:
      cleandomain.append(str(x[len(x)-4])+'.'+str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+ str(x[len(x)-1]))
    #if its in the TLD list, do this
    elif len(str(x)) > 3 and str(x[len(x)-2]).rstrip()+'.'+ str(x[len(x)-1]).rstrip() in tld:
      cleandomain.append(str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+ str(x[len(x)-1]))
    elif len(str(x)) > 2 and str(x[len(x)-1]) in tld:
      cleandomain.append(str(x[len(x)-2])+'.'+ str(x[len(x)-1]))
    #if its not in the TLD list, do this
    else:
      cleandomain.append(x)

X is generated as below:

X is a list of lists - the split out parts of a domain like below [['news', 'bbc', 'co', 'uk'], ['graph', 'facebook', 'com']]

import pandas as pd
path = "Desktop/domx.csv"
df = pd.read_csv(path, delimiter=',', header='infer', encoding = "ISO-8859-1")
df2 = df[((df['domain'] != '----'))]
df3 = df2[['domain', 'use']]
for row in df2.iterrows():
  index = df3.domain.str.split('.').tolist()

Any help would be great

Upvotes: 1

Views: 86

Answers (1)

Kenny Ostrom
Kenny Ostrom

Reputation: 5871

Let me expand on what Corentin Limier said in comments with a specific counterexample, since you categorically deny this could be true, without actually checking your debugger:

based on your original question error dump:

---> 23 elif len(str(x)) > 5 and str(x[len(x)-3]).rstrip() in cdns:
IndexError: list index out of range

x = ['counterexample']
print ('x =', x)
print ('length of x is', len(x))
print ('length of str(x) is', len(str(x)))

if len(str(x)) > 5:
    print ('You think this is safe')

try:
    x[len(x)-3]
except IndexError:
    print ('but it is not.')

x = ['counterexample']
length of x is 1
length of str(x) is 18
You think this is safe
but it is not.

You need to know if the index is valid, compared to the number of items in x. You are actually looking at the length of the string representation of x, which is completely different. The string is 18 characters long, but there is only one item in the list.

PS: Don't feel bad, we have ALL done this. By this, I mean "get blinders when we have written code completely different from what we thought we did." This is one of the primary reasons for "code review" in professional settings.

Upvotes: 3

Related Questions