Reputation: 306
I am looking to utilize the ICU library for transliteration, but I would like to provide a custom transliteration file for a set of specific custom transliterations, to be incorporated into the ICU core at compile time for use in binary form elsewhere. I am working with the source of ICU 4.2 for compatibility reasons.
As I understand it, from the ICU Data page of their website, one way of going about this is to create the file trnslocal.mk within ICUHOME/source/data/translit/ , and within this file have the single line TRANSLIT_SOURCE_LOCAL=custom.txt
.
For the custom.txt
file itself, I used the following format, based on the master file root.txt
:
custom{
RuleBasedTransliteratorIDs {
Kanji-Romaji {
file {
resource:process(transliterator){"custom/Kanji_Romaji.txt"}
direction{"FORWARD"}
}
}
}
TransliteratorNamePattern {
// Format for the display name of a Transliterator.
// This is the language-neutral form of this resource.
"{0,choice,0#|1#{1}|2#{1}-{2}}" // Display name
}
// Transliterator display names
// This is the English form of this resource.
"%Translit%Hex" { "%Translit%Hex" }
"%Translit%UnicodeName" { "%Translit%UnicodeName" }
"%Translit%UnicodeChar" { "%Translit%UnicodeChar" }
TransliterateLATIN{
"",
""
}
}
I then store within the directory custom
the file Kanji_Romaji.txt
, as found here. Because it uses >
instead of the →
I have seen in other files, I converted each entry appropriately, so they now look like:
丁 → Tei ;
七 → Shichi ;
When I compile the ICU project, I am presented with no errors.
When I attempt to utilize this custom transliterator within a testfile, however (a testfile that works fine with the in-built transliterators), I am met with the error error: 65569:U_INVALID_ID
.
I am using the following code to construct the transliterator and output the error:
UErrorCode status = U_ZERO_ERROR;
Transliterator *K_R = Transliterator::createInstance("Kanji-Romaji", UTRANS_FORWARD, status);
if (U_FAILURE(status))
{
std::cout << "error: " << status << ":" << u_errorName(status) << std::endl;
return 0;
}
Additionally, a loop through to Transliterator::countAvailableIDs()
and Transliterator::getAvailableID(i)
does not list my custom transliteration. I remember reading with regard to custom converters that they must be registered within /source/data/mappings/convrtrs.txt . Is there a similar file for transliterators?
It seems that my custom transliterator is either not being built into the appropriate packages (though there are no compile errors), is improperly formatted, or somehow not being registered for use. Incidentally, I am aware of the RuleBasedTransliterator route at runtime, but I would prefer to be able to compile the custom transliterations for use in any produced binary.
Let me know if any additional clarification is necessary. I know there is at least one ICU programmer on here, who has been quite helpful in other posts I have written and seen elsewhere as well. I would appreciate any help I can find. Thank you in advance!
Upvotes: 4
Views: 3065
Reputation: 4350
Transliterators are sourced from CLDR - you could add your transliterator to CLDR (the crosswire directory contains it in XML format in the cldr/ directory) and rebuild ICU data. ICU doesn't have a simple mechanism for adding transliterators as you are trying to do. What I would do is forget about trnslocal.mk or custom.txt as you don't need to add any files, and simply modify root.txt - you might file a bug if you have a suggested improvement.
Upvotes: 3