Reputation: 51
Consider a string:
str1="abcd<aaa>some thing <#^&*some more!#$@ </aaa>
abcdefgasf <aaa>asfaf %^&*$saf asf %$^ </aaa>
<another tag> some text </another tag>
<aaa>sfafaff#%%%^^</aaa> "
Now in the above string how to replace the special characters and white spaces that are present between the tag <aaa>
and </aaa>
?
The replacing character should be '_'.
Upvotes: 0
Views: 257
Reputation: 24951
Here is a possible solution, is a little bit complex, so I'll explain it step by step.
We are going to use a module called re
, for regular expressions:
import re
OK, here is our string:
s = 'abcd<aaa>some thing <#^&*some more!#$@ </aaa> abcdefgasf <aaa>asfaf %^&*$saf asf %$^ </aaa> <another tag> some text </another tag> <aaa>sfafaff#%%%^^</aaa>'
First, let's get all the content inside the tags:
inside_tags = re.findall('<aaa>(.+?)</aaa>', s)
Now, lets iterate through each content of inside_tags
and replace the special characters:
cleaned_contents = [ re.sub('[^\w ]', '_' , content) for content in inside_tags ]
So, in cleaned_contents
now we have the contents inside the tags, but with the special characters replaced. Now, lets zip
(join in a tuple) each content inside a tag with its "cleaned" content:
zipped = zip(inside_tags, cleaned_contents)
And finally, search the tag contents in the string and replace them with the new cleaned content:
for old, new in zipped:
s = s.replace(old, new)
NOTE: If you don't understand something (there is a bunch of weird stuff here, like ?
, [^\w ]
, zip
) post your comment below and I'll explain it.
Upvotes: 2
Reputation: 49920
First, you'll need to split up the string around the tags (you can used re.split() if the string is well-behaved, otherwise use an XML parser); then you can use re.sub() to replace the characters you want replaced.
Upvotes: 0