Reputation: 545
I am learning Python, and am trying to learn data.split(). I found the following in another StackOverflow question (link here), discussing appending a file in Python.
I have created biki.txt per the above link. Here's my code:
import re
import os
import sys
with open("biki.txt","r") as myfile:
mydata = myfile.read()
data = mydata.replace("http","%http")
for m in range (1,1000):
dat1 = data.split("%")[m]
f = open ("new.txt", "a")
f.write(dat1)
f.close()
But when I run the above, I get the error:
dat1 = data.split("%")[m]
IndexError: list index out of range
How come? I can't find documentation as to what that [m] does, but removing it doesn't fix the issue. (If I remove [m], then the error changes and says that f.write(dat1) must be a string, or read only character buffer (?).
Thank you for any help or ideas!
Upvotes: 2
Views: 12234
Reputation: 1319
First, you need understand what is happening with m in your code. Assuming:
for m in range(1,1000):
print(m)
In the first loop, the value of m will be equal to 1.
In the next loop (and until m be less than 1000) the value of m will be m+1, I mean, if in the previous loop the value of m was 1, then, in this loop m will be equal to 2.
Second, you need to understand that the expression data.split('%') will split a string where it finds a '%' character, returning a list.
For example, assuming:
data = "one%two%three%four%five"
numbers = data.split('%')
numbers will be a list with five elements like this:
numbers = ['one','two','three','four','five']
To get each element on a list, you must subscript the list, which means to use the fancy [] operators and an index number (actually, you can do a lot more, like slicing):
numbers[0] # will return 'one'
numbers[1] # will return 'two'
...
numbers[4] # will return 'five'
Note that the first element on a list has index 0.
The list numbers has 5 elements, and the indexing starts with 0, so, the last element will have index 4. If you try to subscript with an index higher than 4, the Python Interpreter will raise an IndexError since there is no element at such index.
Your code is generating a list with less elements than the range you created. So, the list index is being exhausted before the for loop is done. I mean, if dat1 has 500 elements, when the value of m is 500 (don't forget that list indexes starts with 0) an IndexError is raised.
If I got what you want to do, you may achieve your objective with this code:
with open("input.txt","r") as file_input:
raw_text = file_input.read()
formated_text = raw_text.replace("http","%http")
data_list = formated_text.split("%")
with open("output.txt","w") as file_output:
for data in data_list:
file_output.write(data+'\n') # writting one URL per line ;)
Upvotes: 2
Reputation: 122032
You should just iterate over data.split()
:
for dat1 in data.split("%"):
Now you only split once (rather than on every iteration), it doesn't have to contain 1000+ items (which was the cause of the IndexError
) and it gives a string to f.write()
rather than a list (the source of the other error).
Upvotes: 2