rocknrollpartytime
rocknrollpartytime

Reputation: 53

Creating a censoring function from a list of bad words

I'm trying to create a function that censors words in a string. It's kinda working, with a few quirks.

This is my code:

def censor(sentence):
    badwords = 'apple orange banana'.split()
    sentence = sentence.split()

    for i in badwords:
        for words in sentence:
            if i in words:
                pos = sentence.index(words)
                sentence.remove(words)
                sentence.insert(pos, '*' * len(i))

    print " ".join(sentence)

sentence = "you are an appletini and apple. new sentence: an orange is a banana. orange test."

censor(sentence)

And the output:

you are an ***** and ***** new sentence: an ****** is a ****** ****** test.

Some punctuation is gone and the word "appletini" is replaced wrongly.

How can this be fixed?

Also, is there any simpler way of doing this kind of thing?

Upvotes: 1

Views: 4245

Answers (2)

jonrsharpe
jonrsharpe

Reputation: 122091

The specific problems are that:

  1. You don't consider punctuation at all; and
  2. You use the length of the "bad word", not the word, when inserting '*'s.

I would switch the loop order around, so you only process the sentence once, and use enumerate rather than remove and insert:

def censor(sentence):
    badwords = ("test", "word") # consider making this an argument too
    sentence = sentence.split()

    for index, word in enumerate(sentence):
        if any(badword in word for badword in badwords):
            sentence[index] = "".join(['*' if c.isalpha() else c for c in word])

    return " ".join(sentence) # return rather than print

Testing str.isalpha will replace only upper- and lower-case letters with asterisks. Demo:

>>> censor("Censor these testing words, will you? Here's a test-case!")
"Censor these ******* *****, will you? Here's a ****-****!"
            # ^ note length                         ^ note punctuation

Upvotes: 2

Serbitar
Serbitar

Reputation: 2224

Try:

for i in bad_word_list:
    sentence = sentence.replace(i, '*' * len(i))

Upvotes: 0

Related Questions