Reputation: 4171
This seems like a weird problem, and it's causing my some heartburn, because i'm using a library that stashes the current locale, and tries to set it back to what it stashed.
$ docker run --rm -it python:3.6 bash
root@bcee8785c2e1:/# locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=
root@bcee8785c2e1:/# locale -a
C
C.UTF-8
POSIX
root@bcee8785c2e1:/# python
Python 3.6.9 (default, Jul 13 2019, 14:51:44)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> curr = locale.getlocale()
>>> curr
('en_US', 'UTF-8')
>>> locale.setlocale(locale.LC_ALL, curr)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/locale.py", line 598, in setlocale
return _setlocale(category, locale)
locale.Error: unsupported locale setting
>>>
I'm not sure why getlocale
is returning en_US
? It's not anywhere in my environment vars (and I'm not sure where else it could be in my shell?).
In any case, I can't setlocale
with the value from getlocale
, which seems weird to me.
Does anyone have any guidance here?
Much appreciated!
Upvotes: 6
Views: 7856
Reputation: 1182
The intention of C.UTF-8 is good but the implementation not quite yet. For now avoid till it stabilizes.
A redhat discussion around including it. Which means it's not quite there (at time of writing at least). Note particularly, Nick Coghlan, a core python-dev, suggests that python doesn't get locales right in some contexts like this one.
A haskell discussion showing that portable cross-platform stuff — in this case haskell-stack but by implication also docker — becomes harder and less reliable with C.UTF-8 usage.
Debian (also) initiated C.UTF-8 and the intention is correct.
Today's Linux systems are intensively localized — a slew of locales, fine-grained choice of LC_* choices etc etc. But all this is not on by default: if the locale system is broken the system is broken. The reason a broken locale-system is not as drastic in effects as say a broken kernel or fstab or grub etc is...
The C locale (synonym POSIX) is guaranteed to always be available as a fallback if other things break. So for example you won't see localized errors but English — not mojibake or empty rectangles or question-marks!
By and large you get these kind of warnings not errors and otherwise things keep working.
But C = POSIX implies the legacy ASCII not UTF-8 everywhere — an undesired side-effect of legacy.
Towards making that legacy less and less necessary even as a fallback, Debian introduced the always available C.UTF-8 locale.
The catch? It's always available...
Which means recent Debian, derivatives like Ubuntu also recent. But not (yet) other systems.
In short C.UTF-8 is not universal, not portable, fragile and therefore avoidable... at least for now, at least on client-server, virtualized (containerized) etc systems like docker. The....
You need to explicitly install old-fashioned locales like en_US.UTF-8. (People wanting a reasonable international English locale and not wanting en_US may wish to check out en_DK.UTF-8).
Yeah that involves some amount of
Here is a collection of references on docker oriented locale setup
I don't approve of one anti-pattern that repeats in the above but It's going too far afield (from this question) to expand on this, so in v short:
Setting locale should usually only involve setting LANG
. Setting LC_ALL
, especially along with LANG
is a no-no.
From Debian wiki
⚠️ WARNING
Using LC_ALL is strongly discouraged as it overrides everything. Please use it only when testing and never set it in a startup file.
Upvotes: 1
Reputation: 9533
For the first part: Does it matter? As far I know, I never see differences until you call setlocale()
, so we are on the second part:
You should use:
import locale
curr = locale.getdefaultlocale()
locale.setlocale(locale.LC_ALL, curr)
so getdefaultlocale()
and not just getlocale()
. I also do not fully understand the reason to have both. Is it possible that it is a Python bug that fail to recognize C.xxx
.
Upvotes: 1