Using Python 2.X's locale module to format numbers and currency

Question

The task is to format numbers, currency amounts and dates as unicode strings in a locale-aware manner.

First naive attempt with numbers gave hope:

Python 2.7 (r27:82525, Jul  4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'English_Australia.1252'
>>> locale.format("%d", 12345678, grouping=True)
'12,345,678'
>>> locale.format(u"%d", 12345678, grouping=True)
u'12,345,678'
>>>

Now try French:

>>> locale.setlocale(locale.LC_ALL, 'French_France')
'French_France.1252'
>>> locale.format("%d", 12345678, grouping=True)
'12\xa0345\xa0678'
>>> locale.format(u"%d", 12345678, grouping=True)
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\python27\lib\locale.py", line 190, in format
    return _format(percent, value, grouping, monetary, *additional)
  File "C:\python27\lib\locale.py", line 211, in _format
    formatted, seps = _group(formatted, monetary=monetary)
  File "C:\python27\lib\locale.py", line 160, in _group
    left_spaces + thousands_sep.join(groups) + right_spaces,
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)

What is happening here?

>>> locale.localeconv() # output edited for brevity
{'thousands_sep': '\xa0', 'mon_thousands_sep': '\xa0', 'currency_symbol': '\x80'}

Wah! Looks a little legacyish. A work-around suggests itself:

>>> locale.format("%d", 12345678, grouping=True).decode(locale.getpreferredencoding())
u'12\xa0345\xa0678'
>>>

UPDATE 1 locale.getpreferredencoding() is NOT the way to go; use locale.getlocale()[1] instead:

Python 2.7 (r27:82525, Jul  4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding(), locale.getlocale()
('cp1252', (None, None))
>>> locale.setlocale(locale.LC_ALL, '')
'English_Australia.1252'
>>> locale.getpreferredencoding(), locale.getlocale()
('cp1252', ('English_Australia', '1252'))
>>> locale.setlocale(locale.LC_ALL, 'russian_russia')
'Russian_Russia.1251'
>>> locale.getpreferredencoding(), locale.getlocale()
('cp1252', ('Russian_Russia', '1251')) #### Whoops! ####
>>>

UPDATE 2 There are very similar problems with the strftime() family and with str.format()

>>> locale.setlocale(locale.LC_ALL, 'french_france')
'French_France.1252'
>>> format(12345678, 'n')
'12\xa0345\xa0678'
>>> format(12345678, u'n') # type triggers cast to unicode somehow
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 2: ordinal not in range(128)
>>> import datetime;datetime.date(1999,12,31).strftime(u'%B') # type is ignored
'd\xe9cembre'
>>>

In all cases, the workaround is to use only str objects when calling these methods, get a str result, and decode it using the encoding obtained by locale.getlocale()[1]

Using Python 2.X's locale module to format numbers and currency

Answers (1)

Related Questions

Using Python 2.X&#39;s locale module to format numbers and currency

Answers (1)

Related Questions

Using Python 2.X's locale module to format numbers and currency