Reputation: 2146
Trying to find and replace a list of strings (separated by new lines) such as
aba
abanga
abaptiston
abarelix
With a list like
aba
aca
ada
Such that, if an item in the second list appears in the first it should be deleted.
I have code that half works
def replace_all(text, dic):
for i, j in dic.iteritems():
text = text.replace(i, j)
return text
with open("words.txt", "r") as f:
content = f.readlines()
str = ''.join(str(e) for e in content) #list may include numbers
delet = {"aba":"", "aca":"", "ada":"",}
txt = replace_all(str, delet)
f = open("deltedwords.txt","w")
f.write(txt)
Unfortunately this will catch false positives of partial strings so the end result will be
nga
ptiston
relix
Trying to add whitespace or additional characters before the words being searched doesn't work as it tends to produce only false negatives.
Upvotes: 0
Views: 87
Reputation: 164
How about using:
content_without_keywords = filter(lambda x: x.strip() not in delet.keys(), content)
txt = ''.join(str(e) for e in content_without_keywords)
to remove only exactly matched lines.
Upvotes: 1
Reputation: 8326
You can simply filter, but I would argue that there is no need for a dictionary if you are simply deleting entries.
If order doesn't matter, use a set
:
>>> content = set(['aba', 'abanga', 'abaptiston', 'abarelix'])
>>> unwanted_words = set(['aba', 'aca', 'ada'])
>>> content.difference(unwanted_words)
set(['abanga', 'abarelix', 'abaptiston'])
If it does, just use a list comprehension
>>> content = ['aba', 'abanga', 'abaptiston', 'abarelix']
>>> unwanted_words = ['aba', 'aca', 'ada']
>>> [word for word in content if word not in unwanted_words]
['abanga', 'abaptiston', 'abarelix']
Upvotes: 1