replacing special characters in between the multiple substring in a string

Question

Consider a string:

str1="abcdsome thing <#^&*some more!#$@  
                            abcdefgasf asfaf %^&*$saf  asf %$^ 
                             some text 
                            sfafaff#%%%^^ "

Now in the above string how to replace the special characters and white spaces that are present between the tag and ?

The replacing character should be '_'.

juliomalegria · Accepted Answer

Here is a possible solution, is a little bit complex, so I'll explain it step by step.

We are going to use a module called re, for regular expressions:

import re

OK, here is our string:

s = 'abcdsome thing <#^&*some more!#$@  abcdefgasf asfaf %^&*$saf  asf %$^   some text  sfafaff#%%%^^'

First, let's get all the content inside the tags:

inside_tags = re.findall('(.+?)', s)

Now, lets iterate through each content of inside_tags and replace the special characters:

cleaned_contents = [ re.sub('[^\w ]', '_' , content) for content in inside_tags ]

So, in cleaned_contents now we have the contents inside the tags, but with the special characters replaced. Now, lets zip (join in a tuple) each content inside a tag with its "cleaned" content:

zipped = zip(inside_tags, cleaned_contents)

And finally, search the tag contents in the string and replace them with the new cleaned content:

for old, new in zipped:
    s = s.replace(old, new)

NOTE: If you don't understand something (there is a bunch of weird stuff here, like ?, [^\w ], zip) post your comment below and I'll explain it.

replacing special characters in between the multiple substring in a string

Answers (2)

Related Questions