adaxa
adaxa

Reputation: 1608

How do I use String methods on UTF-8 characters?

How do I use String methods on UTF-8 characters?

For example, I have a string with Cyrillic characters, so when I use string.upcase it doesn't work.

Upvotes: 2

Views: 875

Answers (5)

dlauzon
dlauzon

Reputation: 1311

Ruby 2.4+ now supports more Unicode case mappings (upcase / downcase).

e.g.

String#downcase: "Full Unicode case mapping, suitable for most languages (see :turkic and :lithuanian options below for exceptions). Context-dependent case mapping as described in Table 3-14 of the Unicode standard is currently not supported."

Some details and use cases can be found here.

Upvotes: 0

Jörg W Mittag
Jörg W Mittag

Reputation: 369438

Ruby only supports case conversions on the letters AZ and az.

The reason for this is simply that case conversions for other letters aren't well defined. For example, in Turkish 'I'.downcase # => 'ı' and 'i'.upcase # => 'İ', but in French 'I'.downcase # => 'i' and 'i'.upcase # => 'I'. Ruby would have to know not only the character encoding, but also the language to do that correctly.

Even worse, in German

'MASSE'.downcase

is either

'maße'   # "measurements"
'masse'  # "mass"

In other words: you need to actually understand the text, i.e. you need a full-blown AI, to do case conversions correctly.

And I myself have actually accidentally constructed a sentence once, which was undecidable even for a human.

In short: it's simply impossible to do correctly, which is why Ruby doesn't do it at all. There are third-party libraries, however, like the Unicode library and ActiveSupport, which do support a somewhat larger subset of characters.

Upvotes: 8

Aleksander Pohl
Aleksander Pohl

Reputation: 1695

Unfortunately there is no support for downcase/upcase in Ruby 1.9, since the problems described in other posts. Still you can write you own gem, that will add support for cyrillic. You can look at my gem for Polish - turning on proper case folding is as easy as:

gem 'string_case_pl'

It also provides proper string sorting for Polish.

Upvotes: 0

Rustam Gasanov
Rustam Gasanov

Reputation: 15781

"ТЕКСТ".mb_chars.downcase # => "текст"

Upvotes: 0

tjwallace
tjwallace

Reputation: 5678

The rails active_support gem has string extensions that can handle this.

For example:

# $ sudo gem install activesupport
require 'active_support/core_ext/string'
'Laurent, où sont les tests ?'.mb_chars.upcase.to_s
# outputs => "LAURENT, OÙ SONT LES TESTS ?"

Upvotes: 7

Related Questions