Reputation: 2026
Wonder if you can advise - I get the below error when processing a list of items. I should note, that this script works for 99% of items - as I've expanded the list now to 84M rows, I am now getting this issue.
I do this for each line
elif len(str(x)) > 3 and str(x[len(x)-2]).rstrip() in cdns:
So, I don't see how the index can be out of range, if I'm actively checking if it's over a certain length before processing?
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-2-a28be4b396bd> in <module>()
21 elif len(str(x)) > 4 and str(x[len(x)-2]).rstrip() in cdns:
22 cleandomain.append(str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+str(x[len(x)-1]))
---> 23 elif len(str(x)) > 5 and str(x[len(x)-3]).rstrip() in cdns:
24 cleandomain.append(str(x[len(x)-4])+'.'+str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+ str(x[len(x)-1]))
25 #if its in the TLD list, do this
IndexError: list index out of range
The full loop is below, so I'd expect that if the index list item was out of range, that it'd just carry out the other command & print the list value?
for x in index:
#if it ends with a number, it's an IP
if str(x)[-1].isnumeric():
cleandomain.append(str(x[0])+'.'+str(x[1])+'.*.*')
#if its in the CDN list, take a subdomain as well
elif len(str(x)) > 3 and str(x[len(x)-2]).rstrip() in cdns:
cleandomain.append(str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+str(x[len(x)-1]))
elif len(str(x)) > 4 and str(x[len(x)-3]).rstrip() in cdns:
cleandomain.append(str(x[len(x)-4])+'.'+str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+ str(x[len(x)-1]))
#if its in the TLD list, do this
elif len(str(x)) > 3 and str(x[len(x)-2]).rstrip()+'.'+ str(x[len(x)-1]).rstrip() in tld:
cleandomain.append(str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+ str(x[len(x)-1]))
elif len(str(x)) > 2 and str(x[len(x)-1]) in tld:
cleandomain.append(str(x[len(x)-2])+'.'+ str(x[len(x)-1]))
#if its not in the TLD list, do this
else:
cleandomain.append(x)
X is generated as below:
X is a list of lists - the split out parts of a domain like below [['news', 'bbc', 'co', 'uk'], ['graph', 'facebook', 'com']]
import pandas as pd
path = "Desktop/domx.csv"
df = pd.read_csv(path, delimiter=',', header='infer', encoding = "ISO-8859-1")
df2 = df[((df['domain'] != '----'))]
df3 = df2[['domain', 'use']]
for row in df2.iterrows():
index = df3.domain.str.split('.').tolist()
Any help would be great
Upvotes: 1
Views: 86
Reputation: 5871
Let me expand on what Corentin Limier said in comments with a specific counterexample, since you categorically deny this could be true, without actually checking your debugger:
based on your original question error dump:
---> 23 elif len(str(x)) > 5 and str(x[len(x)-3]).rstrip() in cdns:
IndexError: list index out of range
x = ['counterexample']
print ('x =', x)
print ('length of x is', len(x))
print ('length of str(x) is', len(str(x)))
if len(str(x)) > 5:
print ('You think this is safe')
try:
x[len(x)-3]
except IndexError:
print ('but it is not.')
x = ['counterexample']
length of x is 1
length of str(x) is 18
You think this is safe
but it is not.
You need to know if the index is valid, compared to the number of items in x. You are actually looking at the length of the string representation of x, which is completely different. The string is 18 characters long, but there is only one item in the list.
PS: Don't feel bad, we have ALL done this. By this, I mean "get blinders when we have written code completely different from what we thought we did." This is one of the primary reasons for "code review" in professional settings.
Upvotes: 3