StolaDev
StolaDev

Reputation: 137

How do I replace words (in a txt file) that match my list of strings?

I was successful with a single-word replacement:

email = open('email.txt', 'r').read()

def single_string_replace(email):
    return email.replace('word1', 'REDACTED')

But I could not get a list of words to work "flawlessly". This is my attempt:

email = open('email.txt', 'r').read()
banned_words = ['word1', 'phrase one']

def list_replace(email):
    list_place = 0
    while list_place < len(banned_words):
        for word in banned_words:
            email = email.replace(word, 'REDACTED')
            list_place += 1
        return email

I am optimally looking to keep the .TXT files unchanged, and only seeing the changes by a print() statement such as

print(list_replace(email))

The issue that I am having is:

As always has been, is, and shall be: all suggestions are welcome!

Thank you

Upvotes: 3

Views: 640

Answers (3)

kederrac
kederrac

Reputation: 17322

you could use re.sub:

import re


email = open('email.txt', 'r').read()
banned_words = ['word1', 'phrase one']
pattern = '|'.join(f'\\b{w}\\b' for w in banned_words)

def list_replace(email):
    return re.sub(pattern, 'REDACTED', email)

print(list_replace(email))

Upvotes: 1

SidharthMacherla
SidharthMacherla

Reputation: 400

Here is a function that replaces words. One could change the swlist in the function to add or delete more such stop words.


Function to replace text

from nltk import word_tokenize

def mask_word(with_sw):
    swlist = ['dog','cat']
    without_sw = ""
    char = 'nan'
    tokens = word_tokenize(with_sw)
    for char in tokens:
        if char in swlist:
            without_sw = without_sw + " " + "REDACTED"
        else:
            without_sw = without_sw + " " + char

    return(without_sw)    

An example usage is below

text = "this is a dog and hotdog test"

print(mask_word(text))

Output looks like this:

this is a REDACTED and hotdog test

Upvotes: 1

Sohail Saha
Sohail Saha

Reputation: 573

Try it in this way

words = open('email.txt').read().split() #to get a list of words
words = [word.replace('\n','') for word in words] #removing all newlines if any
censored_words = ['ADD', 'YOUR', 'WORDS', 'HERE']

for word in words:
    if word in censored_words:
        print(word) #printing all the occurences of censored words

Upvotes: 0

Related Questions