Reputation: 43867
I'm working on porting a library so that it is compatible with both python 2 and 3. The library receives strings or string-like objects from the calling application and I need to ensure those objects get converted to unicode strings.
In python 2 I can do:
unicode_x = unicode(x)
In python 3 I can do:
unicode_x = str(x)
However, the best cross-version solution I have is:
def ensure_unicode(x):
if sys.version_info < (3, 0):
return unicode(x)
return str(x)
which certainly doesn't seem great (although it works). Is there a better solution?
I am aware of unicode_literals
and the u
prefix but both of those solutions do not work as the inputs come from clients and are not literals in my library.
Upvotes: 14
Views: 15973
Reputation: 327
If six.text_type(b'foo') -> "b'foo'"
in Python 3 is not what you want as mentioned in Alex's answer, probably you want six.ensure_text()
, available in six v1.12.0+.
In [17]: six.ensure_text(b'foo')
Out[17]: 'foo'
Ref: https://six.readthedocs.io/#six.ensure_text
Upvotes: 4
Reputation: 71
Using six.text_type
should suffice virtually always, just like the accepted answer says.
On a side note, and FYI, you could get yourself into trouble in Python 3 if you somehow feed a bytes
instance to it, (although this should be really hard to do).
CONTEXT
six.text_type
is basically an alias for str
in Python 3:
>>> import six
>>> six.text_type
<class 'str'>
Surprisingly, using str
to cast bytes
instances gives somewhat unexpected results:
>>> six.text_type(b'bytestring')
"b'bytestring'"
Notice how our string just got mangled? Straight from str
's docs:
Passing a
bytes
object tostr()
without the encoding or errors arguments falls under the first case of returning the informal string representation.
That is, str(...)
will actually call the object's __str__
method, unless you pass an encoding
:
>>> b'bytestring'.__str__()
"b'bytestring'"
>>> six.text_type(b'bytestring', encoding='utf-8')
'bytestring'
Sadly, if you do pass an encoding
, "casting" regular str
instances will no longer work:
>>> six.text_type('string', encoding='utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: decoding str is not supported
On a somewhat related note, casting None
values can be troublesome as well:
>>> six.text_type(None)
'None'
You'll end up with a 'None'
string, literally. Probably not what you wanted.
ALTERNATIVES
Just use six.text_type. Really. There's nothing to worry about unless you interact with bytes
on purpose. Make sure to check for None
s before casting though.
Use Django's force_text
. Safest way out of this madness if you happen to be working on a project that's already using Django 1.x.x.
Copy-paste Django's force_text
to your project. Here's a sample implementation.
For either of the Django alternatives, keep in mind that force_text
allows you to specify strings_only=True
to neatly preserve None
values:
>>> force_text(None)
'None'
>>> type(force_text(None))
<class 'str'>
>>> force_text(None, strings_only=True)
>>> type(force_text(None, strings_only=True))
<class 'NoneType'>
Be careful, though, as it won't cast several other primitive types as well:
>>> force_text(100)
'100'
>>> force_text(100, strings_only=True)
100
>>> force_text(True)
'True'
>>> force_text(True, strings_only=True)
True
Upvotes: 4
Reputation: 1123500
Don't re-invent the compatibility layer wheel. Use the six
compatibility layer, a small one-file project that can be included with your own:
Six supports every Python version since 2.6. It is contained in only one Python file, so it can be easily copied into your project. (The copyright and license notice must be retained.)
It includes a six.text_type()
callable that does exactly this, convert a value to Unicode text:
import six
unicode_x = six.text_type(x)
In the project source code this is defined as:
import sys
PY2 = sys.version_info[0] == 2
PY3 = sys.version_info[0] == 3
# ...
if PY3:
# ...
text_type = str
# ...
else:
# ...
text_type = unicode
# ...
Upvotes: 23