Reputation: 2410
I'm having a little project where i'm trying to localize a japanese software. Sadly for this project i cannot change the byte length of several strings and they're hardcoded to use UTF16, which means a single kanji that can represent a complicated word can only be replaced with a single latin letter. (As opposed to UTF8, where latin letters use up fewer bytes.)
However the fonts the software uses can be replaced freely.
One solution for this then would be to use a font that has glyphs that contain multiple latin characters.
Does something like this already exist?
Upvotes: 0
Views: 298
Reputation: 53725
If you can use an OpenType font: use the GSUB mechanism, which is the most "natural" (insofar as that is a thing) way to localize through a font. By defining every character sequence in your localization map as its own substitution rule, you also avoid overlap problems that you get from a PUA mapping.
First off, you define your font with the normal latin character set you want to use, using a standard cmap subtable format 4 (effectively: like all modern fonts), and then you define the entire CJK space as "supported" by using a cmap subtable format 14 to say "all codepoints between 2E80 and 10FFF0 are supported and map to glyphid 1". Now your font "supports" CJK but it won't actually be able to "draw" CJK text. And that's fine:
you then define (lots of) GSUB many-to-many ligature rules of the form:
...
消_す -> d_e_l_e_t_e
ヘ_ル_プ -> h_e_l_p
...
so that the true magic happens: without changing anything in your actual application other than changing the font, you now get the correct English (or whatever language you need) correspondence strings purely by virtue of the font doing what it's supposed to do: it sees that it needs to shape the string 消す
, goes "I support all the characters in that string", then goes "I have a substitution rule for this exact combination of code points" and then goes "I have shaped the combination for delete
, here it is."
Of course, this may lead to localizations where your English label is too wide, and you might still need PUA to get around that, in which case you basically perform the same trick: write out your word in illustrator/inkscape/any vector editor with good text support, import the entire localization as single glyph using a font editor or something like TTX, and then set up a GSUB rule that points to that single glyph, rather than to a new sequence, using a many-to-one GSUB rule:
...
報告書 -> {glyph id for the "Generate report" glyph in PUA}
...
And that should be it: exploiting OpenType to achieve localization without ever touching "real" code (aside from the code you write to automate the font generation process of course!)
And I can strongly recommend joining https://typedrawers.com to ask people for the actual implementation and tooling help with that - it's almost trivial in an application like Font Creator, it's certainly hardly a lot of work in something like Font Forge, but typedrawers is where all the typography engineers hang out, and you'll be far more likely to get detailed help than Stackoverflow.
Upvotes: 2