user2406501
user2406501

Reputation: 13

Censoring a text string using a dictionary and replacing words with Xs. Python

I'm trying to make a simple program that takes a string of text t and a list of words l and prints the text but with the words in l replaced by a number of Xs corresponding to letters in the word.

Problem: My code also replaces parts of words that match words in l. How can I make it target only whole words?

def censor(t, l):

    for cenword in l:
        number_of_X = len(cenword)
        sensurliste = {cenword : ("x"*len(cenword))}

        for cenword, x in sensurliste.items():
            word = t.replace(cenword, x)
            t = word.replace(cenword, x)

    print (word)

Upvotes: 1

Views: 9343

Answers (6)

Bhanu Deepak
Bhanu Deepak

Reputation: 1

def censor_string(text, censorlst, replacer):

    word_list = text.split()
    for censor in censorlst:
        index = 0
            for word in word_list:
            if censor.lower() == word.lower():
                ch = len(censor) * replacer
                word_list[index] = ch
            elif censor.lower() == word[0:-1].lower():
                ch = len(censor) * replacer
                word_list[index] = ch+word[-1]
            index+=1

return " ".join(word_list)
censor_string('Today is a Wednesday!', ['Today', 'a'], '-')
censor_string('The cow jumped over the moon.', ['cow', 'over'], '*')
censor_string('Why did the chicken cross the road?', ['Did', 'chicken','road'], '*')

Upvotes: 0

Piotr
Piotr

Reputation: 161

I have done it a little bit more compact:

def censor_string(text, banned_words, replacer):
    return "".join([x + " " if x.lower() not in banned_words else replacer*len(x) + " " for x in text.split(" ") ])

But i am facing problem with special signs like "?" or coma. If i will run below function:

censor_string("Today is a Wednesday!", ["is", "Wednesday"], "*")

Out I receive is "Today ** a Wednesday!" instead of "Today ** a *********!"

Any dieas how to skip, ignore anything but letter and numbers in string?

Upvotes: 0

likarson
likarson

Reputation: 95

this is very easy to understand and clean

def censor(text, word):
       return text.replace(word, ("*"*len(word)))

Upvotes: 0

mata
mata

Reputation: 69052

Another way of doing this would be to use regular expressions to get all words:

import re

blacklist = ['ccc', 'eee']

def replace(match):
    word = match.group()
    if word.lower() in blacklist:
        return 'x' * len(word)
    else:
        return word

text = 'aaa bbb ccc. ddd eee xcccx.'

text = re.sub(r'\b\w*\b', replace, text, flags=re.I|re.U)
print(text)

This has the advantage to work wit all kinds of word boundaries regex recognizes.

Upvotes: 2

lenz
lenz

Reputation: 5818

You can either use a RegExp (module re) for replacement, or split the input string into what you think is a "whole word".

If you consider anything separated whitespace to be a word, you can do the following:

def censor(t, l):
    for cenword in l:
        number_of_X = len(cenword)
        sensurliste = {cenword : ("x"*len(cenword))}
    censored = []
    for word in t.split():
        append(sensurliste.get(word, word))
    return ' '.join(censurliste)

Note that this does not conserve original spacing. Also, if your text contains punctation, this might not produce what you think it should. For example, if t contains the word 'stupid!', but the list only has 'stupid', it will not be replaced.

If you want to tackle all this, you will need to perform tokenisation. You might also have to think of upper case words.

Upvotes: 0

DanChianucci
DanChianucci

Reputation: 1175

First of all, I believe you want to have your for loops on the same level, So that when one completes the other starts.

Secondly, It looks like you have extra code which doesn't really do anything.

for example, sensurliste will only ever have the censored words, paired with the "X" string. Therefore the first for loop is unneeded because it is trivial to just create the "X" string on the spot in the second for loop.

Then, you are saying word = t.replace(cenword,x) t=word.replace(cenword,x)

The second line does nothing, because wordalready has all instances of cenword replaced. So, this can be shortened into just

t = t.replace(cenword,x);

Finally, this is where your problem is, the python replace method doesn't care about word boundaries. so it will replace all instances of cenword no matter if it is a full word or not.

You could use regex to make it so it will only find instances of full words, however, I would just use something more along the lines of

def censort(t,l):
    words = t.split()                       #split the words into a list
    for i in range(len(words)):             #for each word in the text
        if words[i] in l:                       #if it needs to be censoredx
            words[i] = "X"*len(words[i])            #replace it with X's
    t=words.join()                          #rejoin the list into a string

Upvotes: 1

Related Questions