Reputation: 137
I was successful with a single-word replacement:
email = open('email.txt', 'r').read()
def single_string_replace(email):
return email.replace('word1', 'REDACTED')
But I could not get a list of words to work "flawlessly". This is my attempt:
email = open('email.txt', 'r').read()
banned_words = ['word1', 'phrase one']
def list_replace(email):
list_place = 0
while list_place < len(banned_words):
for word in banned_words:
email = email.replace(word, 'REDACTED')
list_place += 1
return email
I am optimally looking to keep the .TXT files unchanged, and only seeing the changes by a print() statement such as
print(list_replace(email))
The issue that I am having is:
As always has been, is, and shall be: all suggestions are welcome!
Thank you
Upvotes: 3
Views: 640
Reputation: 17322
you could use re.sub:
import re
email = open('email.txt', 'r').read()
banned_words = ['word1', 'phrase one']
pattern = '|'.join(f'\\b{w}\\b' for w in banned_words)
def list_replace(email):
return re.sub(pattern, 'REDACTED', email)
print(list_replace(email))
Upvotes: 1
Reputation: 400
Here is a function that replaces words. One could change the swlist in the function to add or delete more such stop words.
Function to replace text
from nltk import word_tokenize
def mask_word(with_sw):
swlist = ['dog','cat']
without_sw = ""
char = 'nan'
tokens = word_tokenize(with_sw)
for char in tokens:
if char in swlist:
without_sw = without_sw + " " + "REDACTED"
else:
without_sw = without_sw + " " + char
return(without_sw)
An example usage is below
text = "this is a dog and hotdog test"
print(mask_word(text))
Output looks like this:
this is a REDACTED and hotdog test
Upvotes: 1
Reputation: 573
Try it in this way
words = open('email.txt').read().split() #to get a list of words
words = [word.replace('\n','') for word in words] #removing all newlines if any
censored_words = ['ADD', 'YOUR', 'WORDS', 'HERE']
for word in words:
if word in censored_words:
print(word) #printing all the occurences of censored words
Upvotes: 0