user1065961
user1065961

Reputation: 21

Unicode string Python

I need to write to a file a string that contains the degree sign (°).

This string is stored in variable and, as expected, when I try: f.write(myVariable.encode('utf-8')) I get UnicodeDecodeError.

If I try to write this string to a file like:

x = u'aaa°°bbb'
f.write(encode(x))

works fine, but I can't write x = u'aaa°°bbb' in my code because 'aaa°°bbb' comes from a data base and it is stored in a variable and if I try newVar = unicode(myVariable) i get UnicodeDecodeError.

I would need to pass myVariable to the 'u' python operator... How can I do this?

Upvotes: 2

Views: 2076

Answers (5)

ekhumoro
ekhumoro

Reputation: 120568

If myVariable is a string that comes from an external source (like a database), you first need to find out what kind of string it is.

Since you seem to be using python2, there are two main possibilities: myVariable is either a unicode string object, or a bytes string object. A unicode string is one that has already been decoded to text characters. A bytes string is one that has already been encoded (using an encoding like 'utf-8' or 'latin-1').

It appears from the example code in your question that myVariable is a bytes string object.

The reason you get the first UnicodeDecodeError is because you are trying to re-encode a byte string. To do this, python would first have to decode myVariable to a unicode string object before it could apply the new encoding. By default, python assumes an "ascii" encoding when automatically decoding in this way - but since myVariable contains bytes beyond the ascii range (0-128), an error occurs.

The same situation occurs when you try to pass myVariable to the unicode function. Unless an explicit encoding is given, python will again assume "ascii", and you will see the same UnicodeDecodeError.

Now, when it comes to writing myVariable to a file, the solution is very simple if it is a bytes string object: do nothing! Just write myVariable directly to the file:

f = open(path, 'wb')
f.write(myVariable)
f.close()

However, when you read the file back, you will need to know the original encoding of myVariable in order to decode it to unicode:

f = open(path)
myVariable = f.read().decode('utf-8')
f.close()

And now if you modify myVariable and want to write it back out to file again, you have to remember that this time it is a unicode string, and so you need to encode it first:

f = open(path, 'wb')
f.write(myVariable.encode('utf-8'))
f.close()

Upvotes: 1

mossplix
mossplix

Reputation: 3865

make sure you have # -*- coding: utf-8 -*- at the top of your python file. It should encode seamlessly

Upvotes: -1

glglgl
glglgl

Reputation: 91017

Depending on if your myVariable is in unicode or bytes format (different naming in py2 and py3), you have to decide on conversion.

As newVar = unicode(myVariable) fails to decode, you probably are in bytes format (str() in py2). So you either have to convince your database to talk in Unicode with you, or you have to know the encoding and decode it according to that.

Upvotes: 1

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798456

Decode it after retrieving it, using whatever encoding your database uses.

s.decode('latin1')

Of course, if it's misencoded in the database in the first place then you'll need to compensate somehow.

s.encode('latin1').decode('utf8')

Upvotes: 2

sorin
sorin

Reputation: 170310

Open the file as text using codecs.open() with UTF-8 encoding and write Unicode string without manually encoding them, it's easier and the code looks better.

Upvotes: 0

Related Questions