Stumbler
Stumbler

Reputation: 2146

Python: search and replace - string delineation issue

Trying to find and replace a list of strings (separated by new lines) such as

aba
abanga
abaptiston
abarelix

With a list like

aba
aca
ada

Such that, if an item in the second list appears in the first it should be deleted.

I have code that half works

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = text.replace(i, j)
    return text

with open("words.txt", "r") as f:
    content = f.readlines()

str = ''.join(str(e) for e in content)  #list may include numbers

delet = {"aba":"", "aca":"", "ada":"",}
txt = replace_all(str, delet)

f = open("deltedwords.txt","w") 
f.write(txt)

Unfortunately this will catch false positives of partial strings so the end result will be

nga
ptiston
relix

Trying to add whitespace or additional characters before the words being searched doesn't work as it tends to produce only false negatives.

Upvotes: 0

Views: 87

Answers (2)

gh640
gh640

Reputation: 164

How about using:

content_without_keywords = filter(lambda x: x.strip() not in delet.keys(), content)
txt = ''.join(str(e) for e in content_without_keywords)

to remove only exactly matched lines.

Upvotes: 1

C.B.
C.B.

Reputation: 8326

You can simply filter, but I would argue that there is no need for a dictionary if you are simply deleting entries.

If order doesn't matter, use a set:

>>> content = set(['aba', 'abanga', 'abaptiston', 'abarelix'])
>>> unwanted_words = set(['aba', 'aca', 'ada'])
>>> content.difference(unwanted_words)
set(['abanga', 'abarelix', 'abaptiston'])

If it does, just use a list comprehension

>>> content = ['aba', 'abanga', 'abaptiston', 'abarelix']
>>> unwanted_words = ['aba', 'aca', 'ada']
>>> [word for word in content if word not in unwanted_words]
['abanga', 'abaptiston', 'abarelix']

Upvotes: 1

Related Questions