Reputation: 191
I have done my code this far but it is not working properly with remove()..can anyone help me..
'''
Created on Apr 21, 2015
@author: Pallavi
'''
from pip._vendor.distlib.compat import raw_input
print ("Enter Query")
str=raw_input()
fo = open("stopwords.txt", "r+")
str1 = fo.read();
list=str1.split("\n");
fo.close()
words=str.split(" ");
for i in range(0,len(words)):
for j in range(0,len(list)):
if(list[j]==words[i]):
print(words[i])
words.remove(words(i))
Here is the error:
Enter Query
let them cry try diesd
let them try
Traceback (most recent call last):
File "C:\Users\Pallavi\workspace\py\src\parser.py", line 17, in <module>
if(list[j]==words[i]):
IndexError: list index out of range
Upvotes: 14
Views: 92486
Reputation: 11
one more easy way to remove words from the list is to convert 2 lists into the set and do a subtraction btw the list.
words = ['a', 'b', 'a', 'c', 'd']
words = set(words)
stopwords = ['a', 'c']
stopwords = set(stopwords)
final_list = words - stopwords
final_list = list(final_list)
Upvotes: 1
Reputation: 540
As an observation, this could be another elegant way to do it:
new_words = list(filter(lambda w: w not in stop_words, initial_words))
Upvotes: 9
Reputation: 5055
''' call this script in a Bash Konsole like so: python reject.py
purpose of this script: remove certain words from a list of words ,
e.g. remove invalid packages in a request-list using
a list of rejected packages from the logfile,
say on https://fai-project.org/FAIme/#
remove trailing spaces e.g. with KDE Kate in wordlist like so:
kate: remove-trailing-space on; BOM off;
'''
with open("rejects", "r+") as fooo :
stwf = fooo.read()
toreject = stwf.split("\n")
with open("wordlist", "r+") as bar :
woL = bar.read()
words = woL.split("\n")
new_words = [word for word in words if word not in toreject]
with open("cleaned", "w+") as foobar :
for ii in new_words:
foobar.write("%s\n" % ii)
Upvotes: 3
Reputation: 3647
The errors you have (besides my other comments) are because you're modifying a list while iterating over it. But you take the length of the list at the start, thus, after you've removed some elements, you cannot access the last positions.
I would do it this way:
words = ['a', 'b', 'a', 'c', 'd']
stopwords = ['a', 'c']
for word in list(words): # iterating on a copy since removing will mess things up
if word in stopwords:
words.remove(word)
An even more pythonic way using list comprehensions:
new_words = [word for word in words if word not in stopwords]
Upvotes: 41