Reputation: 31821
How to convert Chinese traditional or simplified characters to Zhuyin phonetic notation?
Example
# simplified
没关系 --> ㄇㄟˊㄍㄨㄢㄒㄧ
# traditional
沒關係 --> ㄇㄟˊㄍㄨㄢㄒㄧ
Upvotes: 1
Views: 891
Reputation: 31821
The dragonmapper module does hanzi to zhuyin conversion (internally it converts first to pinyin and then to zhuyin):
# install dependencies: pip install dragonmapper
from dragonmapper import hanzi
hanzi.to_zhuyin('太阳')
>>> 'ㄊㄞˋ ㄧㄤ˙'
A possible sequence:
HanyuPinyinOutputFormat outputFormat = new HanyuPinyinOutputFormat();
outputFormat.setToneType(HanyuPinyinToneType.WITH_TONE_NUMBER);
outputFormat.setVCharType(HanyuPinyinVCharType.WITH_U_AND_COLON);
outputFormat.setCaseType(HanyuPinyinCaseType.LOWERCASE);
String[] pinyin = PinyinHelper.toHanyuPinyinStringArray(chineseText, outputFormat);
from pypinyin import pinyin
hanzi_text = '當然可以'
pinyin_text = ' '.join([seg[0] for seg in pinyin(hanzi_text)])
print(pinyin_text)
Provided that you generated a list of pinyin segments on step #1 you can now break the pinyin into segments and replace them using a map such as this one or this one (in js format).
Another solution would be mapping Chinese characters directly to zhuyin using any of the available mappings such as this one: https://github.com/osfans/rime-tool/blob/master/data/y/taiwan.dict.yaml. The downside is that (with this particular source) this will only process Simplified Chinese but won't process Traditional characters.
UPDATE: The mapping from the libchewing project covers both simplified and traditional characters (plus frequency data and special cases for multiple characters): see word.src (400K) and tsi.src (5.2MB). In order to be able to handle segments you'll probably also want to look for a decent Chinese segmentation library such as jieba (python), jieba-analysis (java) etc.
Upvotes: 4