Reputation: 21
I need to write to a file a string that contains the degree sign (°).
This string is stored in variable and, as expected, when I try: f.write(myVariable.encode('utf-8'))
I get UnicodeDecodeError.
If I try to write this string to a file like:
x = u'aaa°°bbb'
f.write(encode(x))
works fine, but I can't write x = u'aaa°°bbb'
in my code because 'aaa°°bbb'
comes from a data base and it is stored in a variable and if I try newVar = unicode(myVariable)
i get UnicodeDecodeError
.
I would need to pass myVariable to the 'u' python operator... How can I do this?
Upvotes: 2
Views: 2076
Reputation: 120568
If myVariable
is a string that comes from an external source (like a database), you first need to find out what kind of string it is.
Since you seem to be using python2, there are two main possibilities: myVariable
is either a unicode string object, or a bytes string object. A unicode string is one that has already been decoded to text characters. A bytes string is one that has already been encoded (using an encoding like 'utf-8' or 'latin-1').
It appears from the example code in your question that myVariable
is a bytes string object.
The reason you get the first UnicodeDecodeError
is because you are trying to re-encode a byte string. To do this, python would first have to decode myVariable
to a unicode string object before it could apply the new encoding. By default, python assumes an "ascii" encoding when automatically decoding in this way - but since myVariable
contains bytes beyond the ascii range (0-128), an error occurs.
The same situation occurs when you try to pass myVariable
to the unicode
function. Unless an explicit encoding is given, python will again assume "ascii", and you will see the same UnicodeDecodeError
.
Now, when it comes to writing myVariable
to a file, the solution is very simple if it is a bytes string object: do nothing! Just write myVariable
directly to the file:
f = open(path, 'wb')
f.write(myVariable)
f.close()
However, when you read the file back, you will need to know the original encoding of myVariable
in order to decode it to unicode:
f = open(path)
myVariable = f.read().decode('utf-8')
f.close()
And now if you modify myVariable
and want to write it back out to file again, you have to remember that this time it is a unicode string, and so you need to encode it first:
f = open(path, 'wb')
f.write(myVariable.encode('utf-8'))
f.close()
Upvotes: 1
Reputation: 3865
make sure you have # -*- coding: utf-8 -*-
at the top of your python file. It should encode seamlessly
Upvotes: -1
Reputation: 91017
Depending on if your myVariable
is in unicode or bytes format (different naming in py2 and py3), you have to decide on conversion.
As newVar = unicode(myVariable)
fails to decode, you probably are in bytes format (str()
in py2). So you either have to convince your database to talk in Unicode with you, or you have to know the encoding and decode it according to that.
Upvotes: 1
Reputation: 798456
Decode it after retrieving it, using whatever encoding your database uses.
s.decode('latin1')
Of course, if it's misencoded in the database in the first place then you'll need to compensate somehow.
s.encode('latin1').decode('utf8')
Upvotes: 2
Reputation: 170310
Open the file as text using codecs.open()
with UTF-8 encoding and write Unicode string without manually encoding them, it's easier and the code looks better.
Upvotes: 0