Reputation: 93
I'm currently writing a short Python script to walk through some directories on a server, find what I'm looking for and save the data to an XML file.
The problem is that some of the data is written in other languages such as "ハローワールド" or something of similar format. When trying to save this in an entry in my XML I get the following traceback:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xec in position 18: ordinal not in range(128)
This is what my function that saves the data looks like:
def addHistoryEntry(self, title, url):
self.log.info('adding history entry {"title":"%s", "url":"%s"}' % (title, url))
root = self.getRoot()
history = root.find('.//history')
entry = etree.SubElement(history, 'entry')
entry.set('title', title)
entry.set('time', str(unixtime()))
entry.text = url
history.set('results', str(int(history.attrib['results']) + 1))
self.write(root)
self.getRoot()
is the following:
def getRoot(self):
return etree.ElementTree(file = self.config).getroot()
and here is the function that writes the data (self.write(root)
)
def write(self, xmlRoot):
bump = open(self.config, 'w+')
bump.write(dom.parseString(etree.tostring(xmlRoot, 'utf-8')).toxml())
bump.close()
imports are:
import xml.etree.ElementTree as etree
import xml.dom.minidom as dom
If anyone can help me out with this issue please do. Thanks for any help.
Upvotes: 1
Views: 299
Reputation: 36
you may use codecs library of python, and .decode('utf-8') to solve it.
import sys,codecs
reload(sys)
sys.setdefaultencoding("utf-8")
Upvotes: 1
Reputation: 2945
By default python will encode unicode into ASCII. As only 128 characters is supported in ASCII, it gonna raise exception value more than 128. So the solution is to encode your unicode into extended format such as latin1 or UTF8 or other ISO charset. I would suggest to encode in utf-8. The syntax would be:
u"Your unucode Text".encode("utf-8", "ignore")
For you
title.encode("utf-8", "ignore")
Upvotes: 0