user2061221
user2061221

Reputation: 23

Internationalization (i18n) of linguistically dependent partial substring

Take a sentence like this:

"I visited this place {0} {1} ago."

While {0} stands for an integer, {1} is the word "year" or "years" respectively. Now, in Russian, the word "year" is not just singular or plural, but depends on the exact number instead (год-года-лет). So, any rule just differentiating between "year" or "years" is insufficient for Russian.

Now, the information I need is this: is there any way to add rules in the resource bundle or in the source code, keeping the entire string, or do I have to split up the string into

"I visited this place {0} " + "{1} " + "ago."

... expanding the rule in the source code? How do you handle problems like this? Is there any best practice?

Upvotes: 2

Views: 795

Answers (2)

Paweł Dyda
Paweł Dyda

Reputation: 18662

To some extent you already answered your question. You should not concatenate strings. Basically, placeholders could be used for numbers, dates and dynamic text.
I would argue that the unit of measurement (time in this case) is not dynamic text.

How can you resolve this problem?

I'll give you some basic blueprints of two ideas. Both require using full sentences.

  1. You can re-arrange the sentence, so that you don't have the problems with plurality, i.e. "The place has been visited this number of years|months|days|hours|minutes ago: {0}".
    This has obvious drawbacks and does not sound naturally. And although I can't give you an example of language where this concept won't work, there is a non-zero probability that such language exists (Slavic languages are not among them, that one is for sure.)

  2. Use some rules-based selection method to select valid plural form from resource files. To do that you need to know just a bit of Language Plural Rules. Basically, you can use these CLDR's rules on your own or you can decide on something else, like wrapping ICU4C's PluralRules class and use its select method to well, select valid plural form.
    The ICU Project site even lists existing wrappers that you can use in your C# application, namely GenICUWrapper and ICU-Dotnet.

Personally, I'd recommend latter method (with ICU wrapper). You may want to see my answer regarding similar problem with solution in Java. I believe .Net's would be based on the same idea, only you would use string.Format() instead of MessageFormat and you would read the resources in .Net's way (whatever style you actually prefer).

Upvotes: 1

Jon
Jon

Reputation: 437554

The golden rule of i18n

Don't produce localized output by concatenating localized strings. Ever. For any reason.


Here you are violating this rule by inserting the localized form of "year/years" midway through a larger piece of text. For your specific example, this is easy to work around -- just localize "N year(s)" as a whole and insert that one instead -- but that would not really solve the problem. There are languages with structures even more dependent on context where this approach would break down fatally at some point.

For best results you should localize the string as a whole. For the Russian locale the string should have 3 different forms depending on the value of the "years" parameter (I don't know Russian, so no idea which form would be used for what values).

I 'm not sure what i18n technologies you are using, but gettext (which the question is tagged with) supports this out of the box.

Upvotes: 3

Related Questions