how to replace HTML codes in HTML file using python?

Question

I'm trying to replace all HTML codes in my HTML file in a for Loop (not sure if this is the easiest approach) without changing the formatting of the original file. When I run the code below I don't get the codes replaced. Does anyone know what could be wrong?

import re
tex=open('ALICE.per-txt.txt', 'r')

tex=tex.read()




for i in tex:
  if i =='õ':
      i=='õ'
  elif i == 'ç':
      i=='ç'



with open('Alice1.replaced.txt', "w") as f:
    f.write(tex)
    f.close()

Matthias · Accepted Answer

You can use html.unescape.

>>> import html
>>> html.unescape('õ')
'õ'

With your code:

import html

with open('ALICE.per-txt.txt', 'r') as f:
    html_text = f.read()

html_text = html.unescape(html_text)

with open('ALICE.per-txt.txt', 'w') as f:
    f.write(html_text)

Please note that I opened the files with a with statement. This takes care of closing the file after the with block - something you forgot to do when reading the file.

how to replace HTML codes in HTML file using python?

Answers (1)

Related Questions