Reputation: 81
I am working on the Japanese text and i have 2 requirement.
eg: 東京都中央区晴海1丁目8番11号
expected output: 東京都中央区晴海<1>丁目<8>番<11>号. all <> should contain double byte characters
eg: ADORES,Inc.
expected output: ADORES, INC.
I am reading this data from a csv file which contain nearly 300 columns and only 3 columns need these operations and rest should remain same.
I've got below code from online but it throws error. raw_comp_name
contain the data from csv.
raw_comp_name.encode(encoding='utf-8').decode('ascii')
Upvotes: 2
Views: 2763
Reputation: 159
Japanese characters has a standards below. Double-Byte Characters are twice as wide as normal alphabetic characters.
You can get more details from this link.
You can use this jaconv | pip module. It has a both Single-Byte to Double-Byte and Double-Byte to Single-Byte functions. See more details from module documentation link
Attached example code below:
import jaconv
hankaku_text = '東京都中央区晴海1丁目8番11号'
converted_zenkaku = jaconv.hankaku2zenkaku(hankaku_text)
print(converted_zenkaku)
zenkaku_text = "ADORES,Inc."
converted_hankaku = jaconv.zenkaku2hankaku(zenkaku_text)
print(converted_hankaku)
output:
東京都中央区晴海1丁目8番11号
ADORES, Inc.
Upvotes: 1