Mixing unicode and str in python 2.X ... problems?

mystr = 'aaaa'
myvar = u'My string %s' % str(mystr)

Can this be a problem in the future? I'm messing up woth some in-house code that uses email modules in Python and found some code like this. mystr will always have only ascii characters since it comes from a list with pre defined ascii only characters.

I didn't write the code, and having str(mystr) or mystr doesn't change the matter of the question.

Doing the first snippet I'm going to have a safe unicode object, or do I have to do

mystr = u'aaaa'
myvar = u'My string %s' % mystr

or

mystr = 'aaaa'
myvar = u'My string %s' % unicode(mystr)

?

(I know this is not the correct way of doing, I know I should handle the exceptions, I'm asking here only if the first snippet returns a valid unicode object, or if Python mess up with it's internals or something when doing it.)

Upvotes: 2

Views: 629

Answers (2)

agf
agf

Reputation: 176800

As long as the regular 8-bit string contains only ASCII characters, you're fine. This can be done to save processing time and / or memory space if you really only need ASCII.

Can it be a problem in the future? Yes, if you're taking input possibly in a non-ASCII character set and saving it in a string. It's also just generally a good idea to be consistent -- don't use strings as storage for text anywhere if you need Unicode widely, unless there is a good reason otherwise.

Upvotes: 1

Emil Ivanov
Emil Ivanov

Reputation: 37633

Try putting actual unicode symbols in the strings (like umlauts or cyrillic) and watch hell breaking lose. :)

s = 'свят' # world
v = u'здравей %s' % s # hello %s 

Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 0: ordinal not in range(128)

The problem is that you will most likely code your application and on a bright shiny day some Russian or German will write her name and will suddenly get an Internal Server Error for having a non-ascii symbol in her name.


I know... I'm asking about the situation in my example, using ascii only in

No, there will be no problem. And IMHO this is a fault in Python, because this is bug, waiting to bite. This should have been a fatal error, but because of historical reasons, I guess, it isn't.

Upvotes: 3

Related Questions