Reputation: 327
I import my txt file as str by using with open
with open('./doc', 'r') as f:
dat = f.readlines()
then I want to clean the data by using a for loop
docs = []
for i in dat:
if i.strip()[0] != '<':
docs.append(i)
error returns
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-131-92a67082e677> in <module>()
1 docs = []
2 for i in dat:
----> 3 if i.strip()[0] != '<':
4 docs.append(i)
IndexError: string index out of range
but if I change the code like this,just select the first 3000 lines, the code works.
docs = []
for i in dat[:3000]:
if i.strip()[0] != '<':
docs.append(i)
My txt file contains 93408 lines ,why I can't select them all? thx!
Upvotes: 1
Views: 1040
Reputation: 16224
one or more lines could be empty, you need to check it before take first elem
if i.strip() != "" and i.strip()[0] != '<':
docs.append(i)
Upvotes: 2