Reputation: 480
I am writing a function that will iterate through a list of text items - parse each item, and append the parsed items back into a list. The code is as below:
clean_list = []
def to_words( list ):
i = 0
while i <= len(list):
doc = list[i]
# 1. Remove HTML
doc_text = BeautifulSoup(doc).get_text()
# 2. Remove non-letters (not sure if this is advisable for all documents)
letters_only = re.sub("[^a-zA-Z]", " ", doc_text)
# 3. Convert to lower case, split into individual words
words = letters_only.lower().split()
# 4. Remove stop words
stops = set(stopwords.words("english"))
meaningful_words = [w for w in words if not w in stops]
# 5. Join the words back into one string separated by space, and return the result.
clean_doc = ( " ".join( meaningful_words ))
i = i+1
clean_list.append(clean_doc)
But when I pass the list into this function, to_words(list)
, I get this error: IndexError: list index out of range
I tried experimenting without technically defining the to_words
function i.e. avoiding the loop, manually changing i as 0,1,2 etc, and following through the steps of the function; this works fine.
Why am I facing this error when I use the function (and loop)?
Upvotes: 1
Views: 1382
Reputation: 16081
Change
while i <= len(list)
to while i < len(list)
List indexing start from 0
so, i <= len(list)
will satisfy the index as equal to len(list)
so that's will make an index error.
1 . Better use for rather than using file loop, list support iterating through the list
. Like
for elem in list_:
# Do your operation here
2 . Don't use list
as a variable name.
Upvotes: 1