How to make lxml output file with utf-8 encoding

Question

data.xml



                
        Bojarski
        -
        -            
    
                
        Genç
        Yasemin
        fgjfgnfgn

SAMPLE CODE

from lxml import etree

dom = etree.parse('data.xml')
root = dom.getroot()

for article in dom.xpath('Article[Affiliation="-"]'):
    root.remove(article)

dom.write('output.xml')

This code deletes articles whose Affiliation is equal to - i.e. whose affiliation tag looks like - when I store the remaining output into output.xml it parses the Unicode character Genç to Genç I want to store it as it is.

Code's output


                
        Genç
        Yasemin
        fgjfgnfgn

Required output


                
        Genç
        Yasemin
        fgjfgnfgn

Sergey Belash · Accepted Answer

There is the encoding parameter in the etree.write method. You may also use xml_declaration=True to declare encoding of the output document.

dom.write('output.xml', encoding='utf-8', xml_declaration=True)

See lxml documentation.

How to make lxml output file with utf-8 encoding

Answers (1)

Related Questions