Reputation: 875
How to replace patterns like <html>
and </mainbody>
by <4>
and </8>
. Here 4 and 8 are the number of alphabets inside <>.
Input is to be taken from file.
import re
def main():
fh=open("input.txt")
pattern=re.compile("</?[a-zA-Z]+>") #regular expression to find patterns <html>, </html>
for line in fh:
print(re.sub(pattern,"***",line.strip()))
if __name__=="__main__":main()
Upvotes: 0
Views: 80
Reputation: 174662
Use a custom method to return the length of the match:
def get_length(obj):
s = obj.groups()[0]
return '</{}>'.format(len(s[1:])) if s.startswith('/') else '<{}>'.format(len(s))
>>> re.sub("<(/?[a-zA-Z]+)>", get_length, '<html>')
'<4>'
>>> re.sub("<(/?[a-zA-Z]+)>", get_length, '</html>')
'</4>'
I hope you realize your regular expression is very basic, and it won't deal with tags that have attributes correctly.
Upvotes: 1