Reputation: 13
I'm trying to make a simple program that takes a string of text t and a list of words l and prints the text but with the words in l replaced by a number of Xs corresponding to letters in the word.
Problem: My code also replaces parts of words that match words in l. How can I make it target only whole words?
def censor(t, l):
for cenword in l:
number_of_X = len(cenword)
sensurliste = {cenword : ("x"*len(cenword))}
for cenword, x in sensurliste.items():
word = t.replace(cenword, x)
t = word.replace(cenword, x)
print (word)
Upvotes: 1
Views: 9343
Reputation: 1
def censor_string(text, censorlst, replacer):
word_list = text.split()
for censor in censorlst:
index = 0
for word in word_list:
if censor.lower() == word.lower():
ch = len(censor) * replacer
word_list[index] = ch
elif censor.lower() == word[0:-1].lower():
ch = len(censor) * replacer
word_list[index] = ch+word[-1]
index+=1
return " ".join(word_list)
censor_string('Today is a Wednesday!', ['Today', 'a'], '-')
censor_string('The cow jumped over the moon.', ['cow', 'over'], '*')
censor_string('Why did the chicken cross the road?', ['Did', 'chicken','road'], '*')
Upvotes: 0
Reputation: 161
I have done it a little bit more compact:
def censor_string(text, banned_words, replacer):
return "".join([x + " " if x.lower() not in banned_words else replacer*len(x) + " " for x in text.split(" ") ])
But i am facing problem with special signs like "?" or coma. If i will run below function:
censor_string("Today is a Wednesday!", ["is", "Wednesday"], "*")
Out I receive is "Today ** a Wednesday!" instead of "Today ** a *********!"
Any dieas how to skip, ignore anything but letter and numbers in string?
Upvotes: 0
Reputation: 95
this is very easy to understand and clean
def censor(text, word):
return text.replace(word, ("*"*len(word)))
Upvotes: 0
Reputation: 69052
Another way of doing this would be to use regular expressions to get all words:
import re
blacklist = ['ccc', 'eee']
def replace(match):
word = match.group()
if word.lower() in blacklist:
return 'x' * len(word)
else:
return word
text = 'aaa bbb ccc. ddd eee xcccx.'
text = re.sub(r'\b\w*\b', replace, text, flags=re.I|re.U)
print(text)
This has the advantage to work wit all kinds of word boundaries regex recognizes.
Upvotes: 2
Reputation: 5818
You can either use a RegExp (module re) for replacement, or split the input string into what you think is a "whole word".
If you consider anything separated whitespace to be a word, you can do the following:
def censor(t, l):
for cenword in l:
number_of_X = len(cenword)
sensurliste = {cenword : ("x"*len(cenword))}
censored = []
for word in t.split():
append(sensurliste.get(word, word))
return ' '.join(censurliste)
Note that this does not conserve original spacing. Also, if your text contains punctation, this might not produce what you think it should. For example, if t contains the word 'stupid!', but the list only has 'stupid', it will not be replaced.
If you want to tackle all this, you will need to perform tokenisation. You might also have to think of upper case words.
Upvotes: 0
Reputation: 1175
First of all, I believe you want to have your for loops on the same level, So that when one completes the other starts.
Secondly, It looks like you have extra code which doesn't really do anything.
for example, sensurliste
will only ever have the censored words, paired with the "X" string. Therefore the first for loop is unneeded because it is trivial to just create the "X" string on the spot in the second for loop.
Then, you are saying word = t.replace(cenword,x) t=word.replace(cenword,x)
The second line does nothing, because word
already has all instances of cenword replaced. So, this can be shortened into just
t = t.replace(cenword,x);
Finally, this is where your problem is, the python replace method doesn't care about word boundaries. so it will replace all instances of cenword no matter if it is a full word or not.
You could use regex to make it so it will only find instances of full words, however, I would just use something more along the lines of
def censort(t,l):
words = t.split() #split the words into a list
for i in range(len(words)): #for each word in the text
if words[i] in l: #if it needs to be censoredx
words[i] = "X"*len(words[i]) #replace it with X's
t=words.join() #rejoin the list into a string
Upvotes: 1