Jimmy
Jimmy

Reputation: 37081

Why doesn't Encoding.default_external respect LANG?

It's my understanding that Ruby's Encoding.default_external is given a default value based on the environment variables LC_ALL and LANG, giving precedence to the former. I've run into several bugs where the default external encoding somehow ends up set to ASCII even though the environment variables are set to UTF-8.

For example:

$ irb
irb(main):001:0> Encoding.default_external
=> #<Encoding:US-ASCII>
irb(main):002:0> ENV['LC_ALL']
=> nil
irb(main):003:0> ENV['LANG']
=> "en_US.UTF-8"

In the environments where this has happened, I've also grepped through all the gems being loaded for any code manually setting the default external encoding, but haven't found anything. How is what I'm seeing above possible? I'm using Ruby 2.2 above, but I've seen this happen on all Ruby 2.x versions.

Upvotes: 1

Views: 1297

Answers (1)

Jimmy
Jimmy

Reputation: 37081

I figured it out. Not only does the LANG environment variable need to be set, but the locale it species must have been generated for the OS. On a stock Linux image, the default locale may be something that is not UTF-8. In my particular case, I'm using Debian 7.7 and the default locale is "POSIX". I was able to set the default locale by installing the locales package and following the interactive prompts to generate the en_US.UTF-8 locale:

$ apt-get -y install locales

If the locales package is already installed, you can just reconfigure it instead:

$ dpkg-reconfigure locales

Now setting LANG will change the current system locale, and Ruby's Encoding.default_external will be set properly:

$ export LANG=en_US.UTF-8
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ irb
irb(main):001:0> Encoding.default_external
=> #<Encoding:UTF-8>

For an example of how to automate the generation and configuration of the default locale instead of doing it interactively, take a look at this Docker image.

Upvotes: 6

Related Questions