List index out of range, with split()

Question

I am learning Python, and am trying to learn data.split(). I found the following in another StackOverflow question (link here), discussing appending a file in Python.

I have created biki.txt per the above link. Here's my code:

import re
import os
import sys 
with open("biki.txt","r") as myfile:
    mydata = myfile.read()
    data = mydata.replace("http","%http")
    for m in range (1,1000):
        dat1 = data.split("%")[m]
        f = open ("new.txt", "a")
        f.write(dat1)
        f.close()

But when I run the above, I get the error:

dat1 = data.split("%")[m]
IndexError: list index out of range

How come? I can't find documentation as to what that [m] does, but removing it doesn't fix the issue. (If I remove [m], then the error changes and says that f.write(dat1) must be a string, or read only character buffer (?).

Thank you for any help or ideas!

Pablo · Accepted Answer

First, you need understand what is happening with m in your code. Assuming:

for m in range(1,1000):
    print(m)

In the first loop, the value of m will be equal to 1.

In the next loop (and until m be less than 1000) the value of m will be m+1, I mean, if in the previous loop the value of m was 1, then, in this loop m will be equal to 2.

Second, you need to understand that the expression data.split('%') will split a string where it finds a '%' character, returning a list.

For example, assuming:

data = "one%two%three%four%five"
numbers = data.split('%')

numbers will be a list with five elements like this:

numbers = ['one','two','three','four','five']

To get each element on a list, you must subscript the list, which means to use the fancy [] operators and an index number (actually, you can do a lot more, like slicing):

numbers[0] # will return 'one'
numbers[1] # will return 'two'
...
numbers[4] # will return 'five'

Note that the first element on a list has index 0.

The list numbers has 5 elements, and the indexing starts with 0, so, the last element will have index 4. If you try to subscript with an index higher than 4, the Python Interpreter will raise an IndexError since there is no element at such index.

Your code is generating a list with less elements than the range you created. So, the list index is being exhausted before the for loop is done. I mean, if dat1 has 500 elements, when the value of m is 500 (don't forget that list indexes starts with 0) an IndexError is raised.

If I got what you want to do, you may achieve your objective with this code:

with open("input.txt","r") as file_input:
    raw_text = file_input.read()

formated_text = raw_text.replace("http","%http")
data_list = formated_text.split("%")

with open("output.txt","w") as file_output:
    for data in data_list:
        file_output.write(data+'
') # writting one URL per line ;)

List index out of range, with split()

Answers (2)

Related Questions