mman1235
mman1235

Reputation: 51

IndexError: string index out of range, can't figure out why

I want in this part of my code, to cut out any non alphabetical symbol in the words I get from a read file.

I get that there is probably an empty string being tested on, that the error is happening,

but I can't figure out why after numerous different codes I tried.

Here's what I have now for it:

for i in given_file:

    cut_it_out = True

    while cut_it_out:
        if len(i) == 0:
            cut_it_out = False
        else:
            while (len(i) != 0) and cut_it_out:
                if i.lower()[0].isalpha() and i.lower()[len(i) - 1].isalpha():
                    cut_it_out = False

                if (not i.lower()[len(i) - 1].isalpha()):
                    i = i[:len(i) - 2]
                if (not i.lower()[0].isalpha()):
                    i = i[1:]

Can anyone help me figure this out? thanks.

Thanks for the interesting answers :), I want it to be even more precise, but there is an endless loop problem on I can't seem to get rid of.

Can anyone help me figure it out?

all_words = {} # New empty dictionary
for i in given_file:
    if "--" in i:
        split_point = i.index("--")
        part_1 = i[:split_point]
        part_2 = i[split_point + 2:]
        combined_parts = [part_1, part_2]

        given_file.insert(given_file.index(i)+2, str(part_1))
        given_file.insert(given_file.index(part_1)+1, str(part_2))
        #given_file.extend(combined_parts)
        given_file.remove(i)
        continue


    elif len(i) > 0:
        if i.find('0') == -1 and i.find('1') == -1 and i.find('2') == -1 and i.find('3') == -1 and i.find('4') == -1\
            and i.find('5') == -1 and i.find('6') == -1 and i.find('7') == -1 and i.find('8') == -1 and i.find('9') == -1:
            while not i[:1].isalpha():
                i = i[1:]

            while not i[-1:].isalpha():
                i = i[:-1]

            if i.lower() not in all_words:
                all_words[i.lower()] = 1 
            elif i.lower() in all_words:
                all_words[i.lower()] += 1

Upvotes: 1

Views: 258

Answers (2)

tobias_k
tobias_k

Reputation: 82929

There are a few problems with your code:

  • The immediate problem is that the second if can strip away the last character in a string of all non-alpha characters, and then the third if will produce an exception.
  • If the last character is non-alpha, you strip away the last two characters.
  • There is no need for those two nested loops, and you can use break instead of that boolean variable
  • if i.lower()[x] is non-alpha, so is i[x]; also, better use i[-1] for the last index

After fixing those issues, but keeping the general idea the same, your code becomes

while len(i) > 0:
    if i[0].isalpha() and i[-1].isalpha():
        break
    if not i[-1].isalpha():
        i = i[:-1]
    elif not i[0].isalpha(): # actually, just 'else' would be enough, too
        i = i[1:]

But that's still a bit hard to follow. I suggest using two loops for the two ends of the string:

while i and not i[:1].isalpha():
    i = i[1:]
while i and not i[-1:].isalpha():
    i = i[:-1]

Or you could just use a regular expression, somethink like this:

i = re.sub(r"^[^a-zA-Z]+|[^a-zA-Z]+$", "", i)

This reads: Replace all (+) characters that are not ([^...]) in the group a-zA-Z that are directly after the start of the string (^) or (|) before the string's end ($) with "".

Upvotes: 1

qPCR4vir
qPCR4vir

Reputation: 3571

I think your problem is a consequence of an over complicated solution. The error was pointed by @tobias_k. And anyway your code can be very inefficient. Try to simplify, for example try: (I have not tested yet)

for i in given_file:
    beg=0
    end=len(i)-1
    while beg<=end and not i[beg].isalpha():
        beg=beg+1
    while beg<=end and not i[end].isalpha():
        end=end-1
    res=""
    if beg<=end:
       res=i[beg:end]

Upvotes: 1

Related Questions