Reputation: 37081
It's my understanding that Ruby's Encoding.default_external
is given a default value based on the environment variables LC_ALL
and LANG
, giving precedence to the former. I've run into several bugs where the default external encoding somehow ends up set to ASCII even though the environment variables are set to UTF-8.
For example:
$ irb
irb(main):001:0> Encoding.default_external
=> #<Encoding:US-ASCII>
irb(main):002:0> ENV['LC_ALL']
=> nil
irb(main):003:0> ENV['LANG']
=> "en_US.UTF-8"
In the environments where this has happened, I've also grepped through all the gems being loaded for any code manually setting the default external encoding, but haven't found anything. How is what I'm seeing above possible? I'm using Ruby 2.2 above, but I've seen this happen on all Ruby 2.x versions.
Upvotes: 1
Views: 1297
Reputation: 37081
I figured it out. Not only does the LANG
environment variable need to be set, but the locale it species must have been generated for the OS. On a stock Linux image, the default locale may be something that is not UTF-8. In my particular case, I'm using Debian 7.7 and the default locale is "POSIX". I was able to set the default locale by installing the locales package and following the interactive prompts to generate the en_US.UTF-8 locale:
$ apt-get -y install locales
If the locales package is already installed, you can just reconfigure it instead:
$ dpkg-reconfigure locales
Now setting LANG
will change the current system locale, and Ruby's Encoding.default_external
will be set properly:
$ export LANG=en_US.UTF-8
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ irb
irb(main):001:0> Encoding.default_external
=> #<Encoding:UTF-8>
For an example of how to automate the generation and configuration of the default locale instead of doing it interactively, take a look at this Docker image.
Upvotes: 6