Hoopes
Hoopes

Reputation: 4171

Python default locale (unsupported locale setting)

This seems like a weird problem, and it's causing my some heartburn, because i'm using a library that stashes the current locale, and tries to set it back to what it stashed.

$ docker run --rm -it python:3.6 bash
root@bcee8785c2e1:/# locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=
root@bcee8785c2e1:/# locale -a
C
C.UTF-8
POSIX
root@bcee8785c2e1:/# python
Python 3.6.9 (default, Jul 13 2019, 14:51:44) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> curr = locale.getlocale()
>>> curr
('en_US', 'UTF-8')
>>> locale.setlocale(locale.LC_ALL, curr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/locale.py", line 598, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting
>>>

I'm not sure why getlocale is returning en_US? It's not anywhere in my environment vars (and I'm not sure where else it could be in my shell?).

In any case, I can't setlocale with the value from getlocale, which seems weird to me.

Does anyone have any guidance here?

Much appreciated!

Upvotes: 6

Views: 7856

Answers (2)

Rusi
Rusi

Reputation: 1182

C.UTF-8 — A recent non-portable debianism

The intention of C.UTF-8 is good but the implementation not quite yet. For now avoid till it stabilizes.

Some discussion of context

A redhat discussion around including it. Which means it's not quite there (at time of writing at least). Note particularly, Nick Coghlan, a core python-dev, suggests that python doesn't get locales right in some contexts like this one.

A haskell discussion showing that portable cross-platform stuff — in this case haskell-stack but by implication also docker — becomes harder and less reliable with C.UTF-8 usage.

The Intention

Debian (also) initiated C.UTF-8 and the intention is correct.

Today's Linux systems are intensively localized — a slew of locales, fine-grained choice of LC_* choices etc etc. But all this is not on by default: if the locale system is broken the system is broken. The reason a broken locale-system is not as drastic in effects as say a broken kernel or fstab or grub etc is...

The C locale

The C locale (synonym POSIX) is guaranteed to always be available as a fallback if other things break. So for example you won't see localized errors but English — not mojibake or empty rectangles or question-marks!

By and large you get these kind of warnings not errors and otherwise things keep working.

But C = POSIX implies the legacy ASCII not UTF-8 everywhere — an undesired side-effect of legacy.

Towards making that legacy less and less necessary even as a fallback, Debian introduced the always available C.UTF-8 locale.

The catch? It's always available...

Only in Debian

Which means recent Debian, derivatives like Ubuntu also recent. But not (yet) other systems.

In short C.UTF-8 is not universal, not portable, fragile and therefore avoidable... at least for now, at least on client-server, virtualized (containerized) etc systems like docker. The....

Practical Upshot

You need to explicitly install old-fashioned locales like en_US.UTF-8. (People wanting a reasonable international English locale and not wanting en_US may wish to check out en_DK.UTF-8).

Yeah that involves some amount of

Getting your hands dirty

Here is a collection of references on docker oriented locale setup

I don't approve of one anti-pattern that repeats in the above but It's going too far afield (from this question) to expand on this, so in v short:

Setting locale should usually only involve setting LANG. Setting LC_ALL , especially along with LANG is a no-no.

From Debian wiki

⚠️ WARNING

Using LC_ALL is strongly discouraged as it overrides everything. Please use it only when testing and never set it in a startup file.

Upvotes: 1

Giacomo Catenazzi
Giacomo Catenazzi

Reputation: 9533

For the first part: Does it matter? As far I know, I never see differences until you call setlocale(), so we are on the second part:

You should use:

import locale
curr = locale.getdefaultlocale()
locale.setlocale(locale.LC_ALL, curr)

so getdefaultlocale() and not just getlocale(). I also do not fully understand the reason to have both. Is it possible that it is a Python bug that fail to recognize C.xxx.

Upvotes: 1

Related Questions