Python: search and replace - string delineation issue

Question

Trying to find and replace a list of strings (separated by new lines) such as

aba
abanga
abaptiston
abarelix

With a list like

aba
aca
ada

Such that, if an item in the second list appears in the first it should be deleted.

I have code that half works

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = text.replace(i, j)
    return text

with open("words.txt", "r") as f:
    content = f.readlines()

str = ''.join(str(e) for e in content)  #list may include numbers

delet = {"aba":"", "aca":"", "ada":"",}
txt = replace_all(str, delet)

f = open("deltedwords.txt","w") 
f.write(txt)

Unfortunately this will catch false positives of partial strings so the end result will be

nga
ptiston
relix

Trying to add whitespace or additional characters before the words being searched doesn't work as it tends to produce only false negatives.

C.B. · Accepted Answer

You can simply filter, but I would argue that there is no need for a dictionary if you are simply deleting entries.

If order doesn't matter, use a set:

>>> content = set(['aba', 'abanga', 'abaptiston', 'abarelix'])
>>> unwanted_words = set(['aba', 'aca', 'ada'])
>>> content.difference(unwanted_words)
set(['abanga', 'abarelix', 'abaptiston'])

If it does, just use a list comprehension

>>> content = ['aba', 'abanga', 'abaptiston', 'abarelix']
>>> unwanted_words = ['aba', 'aca', 'ada']
>>> [word for word in content if word not in unwanted_words]
['abanga', 'abaptiston', 'abarelix']

Python: search and replace - string delineation issue

Answers (2)

Related Questions